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Abstract 



In this paper we present an almost linear time algorithm for solving approximate maximum 
^ flow in undirected graphs. In particular, given a graph with m edges we show how to pro- 

duce a 1 — e approximate maximum flow in time 0(to^+°(^-' • e^^). Furthermore, we present 
this algorithm as part of a general framework that also allows us to achieve a running time 
OO of 0{m^'^°^^^ e~^ k'^) for the maximum concurrent /c-commodity flow problem, the first such 

^^ algorithm with an almost linear dependence on m. We also note that independently Jonah 

ry^ Sherman has produced an almost linear time algorithm for maximum flow and we thank him 

/^ for coordinating submissions. 

' — ' 1 Introduction 

I> Given a graph G = {V,E) in which each edge e £ E is assigned a nonnegative capacity Ue, the 

z^ maximum s-t flow problem asks us to find a flow / that routes as much flow as possible from 

ff^ a source vertex s to a sink vertex t while sending at most Ug units of flow over each edge e. Its 

CN generalization, the maximum concurrent multicommodity floAv problem, supplies k source- 

TJ" sink pairs (sj,tj) and asks for the maximum a such that we may simultaneously route a units of 

flow between each source-sink pair. That is, it asks us to flnd flnd find fiows fi,---,fk (which we 

think of as corresponding to k different commodities) such that fi sends a units of flow from Sj to 

j^ ti, and Y^- \fi{e)\ < Ue for ah e£ E. 

*k>( These problems lie at the core of of graph algorithms and combinatorial optimization and have been 

^ extensively studied for over the past 60 years [2(i, 1]. They have found a wide range of theoretical 

^ and practical applications [ :], and they are widely used as key subroutines in other algorithms 

(see [;;, 27]). 

In this paper, we introduce a new framework for approximately solving flow problems in capacitated, 
undirected graphs and apply it to provide asymptotically faster algorithms for the maximum s-t 
flow and maximum concurrent multicommodity flow problems. For graphs with n vertices and 
m edges, it allows us to flnd an e- approximately maximum s-t flows in time 0{m^^°^^'e^'^k'^), 
improving on the previous best bound of 0(mn^' ^poly(l/e))[ ]. Applying the same framework in 
the multicommodity setting solves a maximum concurrent multicommodity flow problem with k 
commodities in 0(m^+°(^)e~^A;^) time, improving on the existing bound of 0(m^'^poly(e))["']. 



We believe that both our general framework and several of the pieces necessary for its present 
instantiation are of independent interest, and we hope that they will find other applications. These 
include: 

• a non-Euclidean generalization of gradient descent, bounds on its performance, and a way 
to use this to reduce approximate maximum flow and maximum concurrent flow to oblivious 
routing; 

• the definition and efficient construction oi flow sparsifiers; and 

• the construction of a new oblivious routing scheme that can be implemented extremely effi- 
ciently. 

We have aimed to make our algorithm fairly modular, and we have occasionally worked in slightly 
more generality than is strictly necessary for the problem at hand. This has slightly increased the 
length of the exposition, but we believe that it clarifies the high-level structure of the argument, 
and it will hopefully facilitate the application of these tools in other settings. 

1.1 Related Work 

For the first several decades of its study, the fastest algorithms for the maximum flow problem were 
essentially all deterministic algorithms based on combinatorial techniques, such as augmenting 
paths, blocking flows, preflows, and the push-relabel method. These culminated in the work of 
Goldberg and Rao [ ] , which computes exact maximum flows in 0(min(n'^' '^, m^ ") log(n^/m) log U) 
on graphs with edge weights in {0, . . . , U}. We refer the reader to [8] for a survey of these results. 

More recently, a collection of new techniques based on randomization, spectral graph theory and 
numerical linear algebra, graph decompositions and embeddings, and iterative methods for convex 
optimization have emerged. These have allowed researchers to provide better provable algorithms 
for a wide range of flow and cut problems, particularly when one aims to obtain approximately 
optimal solutions on undirected graphs. 

Our algorithm draws extensively on the intellectual heritage established by these works. In this 
section, we will briefly review some of the previous advances that inform our algorithm. We do 
not give a comprehensive review of the literature, but instead aim to provide a high-level view of 
the main that motivated the present work, along with the limitations of these tools that had to 
be overcome. For simplicity of exposition, we will primarily focus throughout the remainder of the 
introduction on the maximum s-t flow problem. 

Sparsification In [5], Benczur and Karger showed how to efficiently approximate any graph 
G with a sparse graph G' on the same vertex set. To do this, they compute a carefully chosen 
probability pe for each e G E, sample each edge e with probability pe, and include e in G' with its 
weight increased by a factor of 1/pe if it is sampled. Using this, they obtain, in nearly linear time 
time, a graph G' with 0(n log n/e^) edges such that the total weight of the edges crossing any cut 
in G' is within a multiplicative factor of 1 ± e of the weight crossing the corresponding cut in G. 
In particular, the Max-Flow Min-Cut Theorem implies that the value of the maximum flow on G' 
is within a factor of 1 it e of that of G. 



This is an extremely effective tool for approximately solving cut problems on a dense graph G, 
since one can simply solve the corresponding problem on the sparsified graph G' . However, while 
this means that one can approximately compute the value of the maximum s-t flow on G by solving 
the problem on G', it is not known how to use the maximum s-t flow on G' to obtain an actual 
approximately maximum flow on G. Intuitively, this is because the weights of edges included in G' 
are larger than they were in G, and the sampling argument does not provide any guidance about 
how to route flows over these edges in the original graph G. 

Iterative algorithms based on linear systems and electrical flows In 2010, Christiano 
et al.\.] described a new linear algebraic approach to the problem that found e-approximately 
maximum s-t flows in time 0(7TT,n^'^poly(l/e)). They treated the edges of G as electrical resistors 
and then computed the electrical flow that would result from sending electrical current from s to 
t in the corresponding circuit. They showed that these flows can be computed in nearly-linear 
time using fast Laplacian linear system solvers [13, 14, ll], which we further discuss below. The 
electrical flow obeys the flow conservation constraints, but it could violate the capacity constraints. 
They then adjusted the resistances of edges to penalize the edges that were flowing too much 
current and repeated the process. Kelner, Miller, and Peng [10] later showed how to use more 
general objects that they called quadratically coupled flows to use a similar approach to solve the 
maximum concurrent multicommodity flow problem in time 0(m^'^poly(e)). 

Following this, Lee, Rao, and Srivastava [IG] proposed another iterative algorithm that uses elec- 
trical flows, but in a way that was substantially different than in [ ]. Instead of adjusting the 
resistances of the edges in each iteration to correct overflowing edges, they keep the resistances 
the same but compute a new electrical flow to reroute the excess current. They explain how to 
interpret this as gradient descent in a certain space, from which a standard analysis would give 
an algorithm that runs in time 0(m^' ^poly(l/e)). By replacing the standard gradient descent 
step with Nesterov's accelerated gradient descent method [22] and using a regularizer to make 
the penalty function smoother, they obtain an algorithm that runs in time 0(?7in^''^poly(l/e)) in 
unweighted graphs. 

In all of these algorithms, the superlinear running times arise from an intrinsic @{^/n) factor 
introduced by using electrical flows, which minimize an £2 objective function, to approximate the 
maximum congestion, which is an £00 quantity. 

Fast solvers for Laplacian linear systems In their breakthrough paper [.30], Spielman and 
Teng showed how to solve Laplacian systems in nearly- linear time. (This was later sped up and 
simplified by Koutis, Miller, and Peng [13, 14] and Kelner, Orecchia, Sidford, and Zhu [11]) Their 
algorithm worked by showing how to approximate the Laplacian Lq of a graph G with the Laplacian 
Lh of a much simpler graph H such that one could use the ability to solve linear systems in Lh 
to accelerate the solution of a linear system in Lq- They then applied this recursively to solve the 
linear systems in Lh- In addition to providing the electrical flow primitive used by the algorithms 
described above, the structure of their recursive sequence of graph simplifications provides the 
motivating framework for much of the technical content of our oblivious routing construction. 



Oblivious routing In an oblivious routing scheme, one specifies a linear operator taking any 
demand vector to a flow routing these demands over the edges of G. Given a cohection of demand 
vectors, one can produce a multicommodity flow meeting these demands by routing each demand 
vector using this pre-specified operator, independently of the others. The competitive ratio of such 
an operator is the worst possible ratio between the congestion incurred by a set of demands in this 
scheme and the congestion of the best multicommodity flow routing these demands. 

In [ ], Racke showed how to construct an oblivious routing scheme with a competitive ratio of 
O(logn). is construction worked by providing a probability distribution over trees Tj such that G 
embeds into each T, with congestion at most 1, and such that the corresponding convex combination 
of trees embeds into G with congestion O(logn). In a sense, one can view this as showing how 
to approximate G by a probability distribution over trees. Using this, he was able to show how 
to obtain poly logarithmic approximations for a variety of cut and flow problems, given only the 
ability to solve these problems on trees. 

We note that an oblivious routing scheme clearly yields a logarithmic approximation to the maxi- 
mum flow and maximum concurrent multicommodity flow problems. However, Racke's construction 
took time substantially superlinear time, making it too slow to be useful for computing approxi- 
mately maximum flows. Furthermore, it only gives a logarithmic approximation, and it not clear 
how to use this a small number of times to reduce the error to a multiplicative e. 

In a later paper [IS], Madry applied a recursive technique similar to the one employed by Spielman 
and Teng in their Laplacian solver to accelerate many of the applications of Racke's construction at 
the cost of a worse approximation ratio. Using this, he obtained almost-linear-time polylogarithmic 
approximation algorithms for a wide variety of cut problems. 

Unfortunately, his algorithm made extensive use of sparsification, which, for the previously men- 
tioned reasons, made it unable to solve the corresponding flow problems. This meant that, while it 
could use flow-cut duality to find a polylogarithmic approximation of the value of a maximum fiow, 
it could not construct a corresponding fiow or repeatedly apply such a procedure a small number 
of times to decrease the error to a multiplicative e. 

1.2 Our Approach 

In this section, we will give a high-level description of how we we overcome the obstacles described 
in the previous section. To avoid unnecessary notation, we will suppose for now that all of the 
edges have capacity 1. 

The problem is thus to send as many units of fiow as possible from s to t without sending more than 
one unit over any edge. It will be more convenient for us to work with an equivalent congestion 
minimization problem, where we try to find the unit s-t flow / (i.e., a flow sending one unit from 
s to t ) that minimizes ||/||oo = maxg \fe\- If we begin with some initial unit s-t fiow /o, the goal 
will be thus be to find the circulation c to /o that minimizes ||/o + c||oo- 

We will give an iterative algorithm to approximately find such a c. There will be 2^^ s"iogiogn)yg2 
iterations, each of which will add a circulation to the present fiow and run in m • 2'^(v^°§"^°§'°s"-) 
time. Constructing this scheme will consist of two main parts: an iterative scheme that reduces the 
problem to the construction of a projection matrix with certain properties; and the construction of 
such an operator. 



The iterative scheme: Non-Euclidean gradient descent 

The simplest way to improve the flow would be to just perform gradient descent on the maximum 
congestion of an edge. There are two problems with this: 

The first problem is that gradient descent depends on having a smoothly varying gradient, but 
the infinity norm is very far from smooth. This is easily remedied by a standard technique: we 
replace the infinity norm with a smoother "soft max" function. Doing this would lead to an update 
that would be a linear projection onto to the space of circulations. This could be computed using 
an electrical fiow, and the resulting algorithm would be very similar to the unaccelerated gradient 
descent algorithm in [?]. 

The more serious problem is the one discussed in the previous section: the difference between (.2 
and £00 • Gradient steps choose a direction by optimizing a local approximation of the objective 
function over a sphere, whereas the £00 constraint asks us to optimize over a cube. The difference 
between the size of the largest sphere inside a cube and the smallest sphere containing it gives 
rise to an inherent 0{y/n) in the number of iterations, unless one can somehow exploit additional 
structure in the problem. 

To deal with this, we introduce and analyze a non-Euclidean variant of gradient descent that 
operates with respect to an arbitrary norm.^ Rather than choosing the direction by optimizing a 
local linearization of the objective function over the sphere, it performs an optimization over the 
unit ball in the given norm. By taking this norm to be £00 instead of ^2, we are able to obtain a 
much smaller bound on the number of iterations, albeit at the expense of having to solve a nonlinear 
minimization problem at every step. The number of iterations required by the gradient descent 
method depend on how quickly the gradient can change over balls in the norm we are using, which 
we express in terms of the Lipschitz constant of the gradient in the chosen norm. 

To apply this to our problem, write flows meeting our demands as /o + c, as described above. 
We then a parametrization of the space of circulations so that the objective function (after being 
smoothed using soft max) has a good bound on its Lipschitz constant. Similarly to what occurs 
in [11], this comes down to finding a good linear representation of the space of circulations, which 
we show here amounts to finding a matrix that projects into the space of circulations while meetings 
certain norm bounds. 

Constructing a projection matrix 

This reduces our problem to the construction of such a projection matrix. A simple calculation 
shows that any linear oblivious routing scheme A with a good competitive ratio gives rise to a pro- 
jection matrix with the desired properties, and thus leads to an iterative algorithm that converges 
in a small number of iterations. Each of these iterations performs a matrix-vector multiplication 
with both A and A^ . 

Intuitively, this is letting us replace the electrical flow routing used in previous algorithms with 
that given by an oblivious routing scheme. Since the oblivious routing scheme was constructed to 



^This idea and analysis seems to be implicit in other work, e.g., [2 !] However, we could not find a clean statement 
like the one we need in the literature, and we have not seen it previously applied in similar settings. We believe that 
it will find further applications, so we state it in fairly general terms before specializing to what we need for flow 
problems. 



meet I^d guarantees, while the electrical flow could only obtain such guarantees by relating (.2 to 
£00; it is quite reasonable that we should expect this to lead to a better iterative algorithm. 

However, the computation involved in existing oblivious routing schemes is not fast enough to be 
used in this setting. Our task thus becomes constructing an oblivious routing scheme that we can 
compute and work with very efficiently. We do this with a recursive construction that reduces 
oblivious routing in a graph to oblivious routing in various successively simpler graphs. 

To this end, we show that if G can be embedded with low congestion into H (existentially), and H 
can be embedded with low congestion into G efficiently^ one can use an oblivious routing on H to 
obtain an oblivious routing on G. The crucial difference between the simplification operations we 
perform here and those in previous papers (e.g., in the work of Benczur-Karger [o] and Madry [In]) 
is that ours are accompanied by such embeddings, which enables us to transfer flows from the 
simpler graphs to the more complicated ones. 

We construct our routing scheme by recursively composing two types of such reductions, each of 
which we show how to implement without incurring a large increase in the competitive ratio: 

Vertex reduction These show how to efficiently reduce oblivious routing on a graph to routing 
on t graphs with roughly 0\E\/t vertices. 

To do this, we show how to efficiently embed a graph G = {V,E) into t simpler graphs, each 
consisting of a tree plus a subgraph supported on roughly 0\E\/t vertices. This follows easily from 
a careful reading of Madry's paper [18]. We then show that routing on such a graph can be reduced 
to routing on a graph with at most 0\E\/t vertices by collapsing paths and eliminating leaves. 

Flo'w sparsifiers These allow us to efficiently reduce oblivious routing on an arbitrary graph to 
oblivious routing on a graph with 0{n) edges. 

To construct flow sparsifiers, we use local partitioning to decompose the graph into well-connected 
clusters that contain many of the original edges. (These clusters are not quite expanders, but they 
are contained in graphs with good expansion in a manner that is sufficient for our purposes.) We 
then sparsify these clusters using standard techniques and then show that we can embed the sparse 
graph back into the original graph using electrical flows. If the graph was originally dense, this 
results in a sparser graphs, and we can recurse on the result. While the implementation of these 
steps is somewhat different, the outline of this construction closely parallel's Spielman and Teng's 
approach to the construction of spectral sparsiflers. 

Combining these recursively yields an efficient oblivious routing scheme, and thus an algorithm for 
the maximum flow problem. 

We then show that the same framework can be applied to the maximum concurrent multicommodity 
flow problem. While the norm and regularization change, the structure of the argument and the 
construction of the oblivious routing scheme go through without requiring substantial modification. 

2 Preliminaries 



General notation: We typically use x to denote a vector and A to denote a matrix. For a vector 
X G R", we let |x| G R" denote the vector such that Vi we have |x| • = |xj|. For a matrix A G [R"X"i^ 



we let lAI denote the matrix such that Vi, 7' we have |A|,,- = |Aj,|. We let 1 denote the all ones 
vector, we let Ij denote the vector that is one at position i and elsewhere, we let I be the identity 
matrix and we overlaod notation let la^b G |R°^" denote the matrix such that for all i < min{a, 6} 
we have la = 1 and 1 = elsewhere. 

Graph Specification: Let G = {V, E, /2) be an undirected capacitated graph with n = |y| vertices 
and 7n = \E\ edges where fie is the capacity of edge e. Let t^e > denote the weight of an edge and 
let re = l/we denote the resistance of an edge. For now we make no connection between //g and 
re', we fix their relationship later. Also, note that while all graphs in this paper will be undirected, 
we will always assume an arbitrary orientation assigned to the edges. 

Fundamental Matrices: Let U,W, R S R^^^ denote the diagonal matrices associated with 
the capacities, the weights, and the resistances respectively. Let B G R^^^ denote the graphs 
incidence matrix where for all e = {a,b) ^ E we have B Ig = ta — Ift- Let C G R''^^^ denote the 
combinatorial graph Laplacian and recall that £ = B R~^B. 
Sizes: For all a G F we let da denote the (weighted) degree of vertex a 

yaeV : da = ^ Wa,b 

{a,b} 

and let D G R^^*"^ be the diagonal matrix where T)a,a = da- Furthermore, for any vertex subset 
5 C y we define its volume by 

vol(5) = ^ da 

Furthermore, we let deg(a) denote the (combinatorial) degree of o, i.e. deg(a) = \{e £ E \ e = 
{a, b} for some b G V}\. 

Cuts: For any set of vertex subset S CV we denote the cut induced by S by the edge subset 

d{S) = {eeE \e^S iinde^E\S} 

and we define the cost of any cut F C E hy w{F) = "^e^F ^e- Using this we define the conductance 
of a set 5" C y by 

g^(o) d^f w{d{S)) 

^ ' min{vol(5),vol(y-5)} 
and we denote the conductance of a graph by 

$(G) = min 0(5) 

Subgraphs: For a graph G = {V, E) and a vertex subset S" C y let G{V) denote the subgraph of G 
consisting of vertex set S and all the edges of E with both endpoints in S, i.e. {{a,b) G E \ a,b £ S}. 
When we we consider a term such as vol or $ and we wish to make the graph such that this is 
respect to clear we use subscripts, so for example vo1g'(s')(^) denotes the volume of vertex set A in 
the subgraph of G induced by S. 

Congestion: For any vector / G R^ we let the congestion'^ of f be given by cong(/) = ||U~"'^/1| . 

For any collection of flows {/j} = {fi, . . . , fk} we overload notation and let the total congestion of 

^Note that here and in the rest of the paper we wiU focus our analysis with congestion with respect to the norm 
jl ■ jl and we will look at oblivious routing strategies that are competitive with respect to this norm. However, many 
of the results present are easily generalizable to other norms but outside the scope of this paper. 



these flows be given by 

cong{{fl})'^\\U-'Y.\Moo 

i 

Demands and Multicommodity Flow Now we call a vector x 6 ^'^ ^ demand vector if 
J2aevX{a) = a'^d given a set of demands D = xi,---,Xk, i-e. Vi G [k],T.a&v Xi{a) = 0' ^^ 
denote the optimal low congestion routing of these demands as follows 

opt(L') = min cong({/j}) 

We call a set of vectors {/j} that meet demands {xi}, i-e. \/i,B'^fi = xi a. multicommodity flow 
meeting the demands. 

Operator Norm: Let 11 • 11 be a family of norms applicable to R" for any n. We define this norms' 
induced norm or operator norm on the set of of m x n matrices by 

VAeR"^™ : ||A|r=max "^ 



Running Time: For any matrix A we let T (A) denote the time needed to apply both of A and 
AT. 



3 Solving Max-Flow Using a Circulation Projection 

3.1 Gradient Descent 

In this section, we discuss the gradient descent method for general norms. Let || • || : R" — )• R be an 
arbitrary norm on R" and recall that the gradient of / at x is defined to be the vector V/(if) G R" 
such that 

f{y) = f{x) + {Vf{x),y-x) + o{\\y- x\\). (1) 

The gradient descent method is a greedy minimization method that updates the current vector, 
X, using the direction which minimizes (V/(x),y — x). To analyze this method's performance, 
we need a tool to compare the improvement (V/(x),y — x) with the step size, lly — x\\, and the 
quantity, V/(x) . For L^ norm, this can be done by Cauchy Schwarz inequality and in general, 
we can define a new norm for V/(x) to make this happens. We call this the dual norm • defined 
as follows 

II -»l I * dcf / ^ ^\ 

||x|| = max \y,x). 

Fact 53 shows that this definition indeed yields that (y, x) < \\y\\ \\x\\. Next, we define the fastest 
increasing direction x*, which is an arbitrary point satisfying the following 

-.41 dot / ^ -\ 1 1 1 H 1 2 

x^ = argmax(^x, s^ — -||s|| . 



In the appendix, we provide some facts about • and x^ that we will use in this section. Using 
the notations defined, the gradient descent method simply produces a sequence of x^ such that 

Xfc+i := Xk - tk{\jf{xk))* 

where t^ is some chosen step size for iteration k. To determine what these step sizes should be we 
need some information about the smoothness of the function, in particular, the magnitude of the 
second order term in (1). The natural notion of smoothness for gradient descent is the Lipschitz 
constant of the gradient of /, that is the smallest constant L such that 

Vx,yGR" : || V /(^) " V/(y)|r < ^ • ||^- y||- 

In the appendix we provide an equivalent definition and a way to compute L, which is useful later. 

Let X* C R" denote the set of optimal solutions to the unconstrained minimization problem 
min^^gR" / and let /* denote the optimal value of this minimization problem, i.e. 

Vx G X* : fix) = f* = min f(y) and ^x 4 X* : f(x) > f* 

We assume that X* is non-empty. Now, we are ready to estimate the convergence rate of the 
gradient descent method. 

Theorem 1 (Gradient Descent). Let / : R" — t- R 6e a convex continuously differentiable function 
and let L be the Lipschitz constant ofVf. For initial point xq G R" we define a sequence of x^ by 
the update rule 

Xk+l ■■= Xk - jiVfi^k))* 

For all k > 0, we have 

f{Xk] — / < — ; where K = max mm \\x — x \\. 

k + A xeR":f{x)<f{xo)x*&X*'' " 

Proof. '^ By the Lipschitz continuity of the gradient of / and Lemma 54 we have 

1 2 

f{xk+i) < f{xk) - ^ (II V f{xk)\\*) ■ 

Furthermore, by the convexity of /, we know that 

Vf,yGR" : f{y)>f{x) + {yf{x),y-x). 
Using this and the fact that f{xk) decreases monotonically with A;, we get 

f{xk)-f*< min (sj f{xk),Xk-x*) < min 11 y /(^fc)|ri|^fc - ^*|| < -R|| V /(^fe)^*- 

Therefore, letting (pk = f{xk) — /*> we have 

1 ,u ,|*.2 ^ <Pl 



h-(t>k+i > 777( v/(^fe) r> 



2^vil V .v^«y|| / - 2-L-i?2- 



^The structure of this specific proof was modeled after a proof in [ ] for a slightly different problem. 



Furthermore, since (pk ^ '/'fe+i, we 


have 


1 1 


_ 4>k- 4>k+l ^ 4>k- 4>k+l 


4>k+l 4>k 


(I)k4>k+i ~ 4>l 


So, by induction, we have that 


1 1 k 



1 
> 



2 • L • i?2 • 



cpk (po 2- L- E?' 
Now, note that since \/ f{x*) = 0, we have that 

/(xo) < /(r) + ( V f{n,x^ - x*> + |||xo - r II' < /(r) + ^r\ 

So, we have that 0o < 2"-^' ^-^d putting this all together yields that 

Ilk 4 k 

Jk ~ Jo^ 2-L-R'^ - 2 • L • i?2 + 2 • L • i?2 • 



D 



3.2 Maximum Flow Formulation 

For an arbitrary set of demands x G ^ we wish to solve the following maximum flow problem 
max a subject to B / = ax and ||U^ /1| < 1. 

Equivalently, we want to compute a m.inim,um, congestion flow 

min U~ /] . 

where we call ||U~^/j| the congestion of /. 

Letting /o € R be some initial feasible flow, i.e. B /o = X) we write the problem equivalently as 

min \\V-^{fQ + c)\\ 

where the output flow is / = /o + c. Although the gradient descent method is applicable to con- 
strained optimization problems and has a similar convergence guarantee, the sub-problem involved 
in each iteration is a constrained optimization problem, which is too complicated in this case. Since 
the domain is a linear subspace, the constraints can be avoided by projecting the variables onto 
this subspace. 

Formally, we define a circulation projection matrix as follows. 

Definition 2. We call a matrix P € [R^^-^ a circulation projection matrix if it is a projection 
matrix onto the circulation space, i.e. it satisfies the following 

• Vx G R^ we have B'^Px = 0. 

10 



• Vx G R with B x = we have Px = x. 

Then, the problem becomes 

mm ||U-i(/^ + Pc)|| . 

Applying gradient descent on this problem is similar to applying projected gradient method on the 
original problem. But, instead of using the orthogonal projection that is not suitable for • , we 
try to pick a better projection matrix. 

Applying the change of basis x = U~^c and letting cTq = U^^/o and P = U^^PU, we write the 
problem equivalently as 

min lldn + Pxll 
x&RE II ll°° 

where the output maximum flow is 

/(f) = U(ao + P^)/||U(ao + Pf)|L. 

3.3 An Approximate Maximum Flow Algorithm 

Since the gradient descent method requires the objective function to be differentiable, we introduce 
a smooth version of • which we call smaxj. In next section, we prove that there is a convex 
differentiable function smax^ such that vsmax^ is Lipshitz continuous with Lipshitz constant j and 
such that 

Vx G R : llxll — tln(2?7i) < smax+fx) < ||x|| 

IIIIOO *- ' — I'K J — IIIIOO 

Now we consider the following regularized optimization problem 

min gt{x) where gt{x) = smaxt(ao + Px). 

For the rest of this section, we consider solving this optimization problem using gradient descent 
under 11 • 11 . 

II lloo 

First, we bound the Lipschitz constant of the gradient of gt- 

II ii2 
l|p|| 
Lemma 3. The gradient of gt is Lipschitz continuous with Lipschitz constant L = 



t 



Proof. By Lemma 54 and the Lipschitz continuity of \/sin.axt, we have 

1 

smaxi(y) < smax((x) + (Vsmaxt(x),y — x) + 7^||y — x\\ . 

Seting X -^ oTo + Px and y -^ do + Py, we have 

gt{y) < s-tly) + (Vsmaxi(ao + Px),Py-Px> + ^||Py-P, 



^ :|Py-Px||^ 



< 9tiy) + (P^Vsmaxi(ao + Px),y-x) + ^||P|LI|y ~ ^IL 

= 9tiy) + (Vgt(x), y - x) + ^||P|LI|y " ^IL' 
Hence, the result follows from Lemma 54. D 
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Now, we apply gradient descent to find an approximate max flow as follows. 



MaxFlow 



Input: any initial feasible flow /q and OPT = min^j; U ^/o + Px 
1. Let ao = (I - P) U-i/^ and xq = 0. 



2. Let t = eOPT/21n(2m) and k = 300 



ln(2m)/e^ 



3. Let gt = smaxi(dro + Px). 

4. For i = 1, • • • ,k 



Xj-|_l — Xj 



W 



— (V5t(^i)) • (See Lemma 48) 



6. Output U (oo + Pa^fc) / cTq + P^fc 



Theorem 4. Let P be a cycle projection matrix, let P = \J ""^PU, and let e < 1. MaxFlow 
outputs an (1 — e)- approximate maximum flow in time 



O 



ln(m) (T(P) + m) 



Proof. First, we bound II Oo 1 1 • Let x* be a minimizer of min^ 1 1 U ^/o + Px| 
Then, we have 



such that Px* 



ao 



< 



< 



lu-V^-Pu^V^II 

|u"'/^ + ^IL + 11^ + pu^V^L 
|u-V^ + ^1L, + ||P^* + Pu-V^l 



1+ p 



\U-'fo + x*\ 



= (l + ||P|| )OPT. 

\ II lloo/ 

Second, we bound R in Theorem 1. Note that 

gt{xo) = smaxt(ao) < ||ao||^ < (l + ||P||oo) ^P'^- 
Hence, the condition gt{x) < gt{xQ) implies that 

||ao + Px|| <(l + ||P|| ) 0PT + tln(2m). 

II" lloo — \ II lloo/ ^ ' 

For any y G X* let c = x — Px + yand note that Pc = Py and therefore cG X* . Using these facts. 
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we can bound R as follows 

R = max i min \\x — x* 



loo 



X k + A 

2-L-R^ 

k + A 



xeRS : gt(x)<gt(xQ) [rr-GX* 

< max i; — q 

xeR^ : gt{S)<9t{So) "°° 

< max Pif— Py 
xeR^ : gt{x)<gt{xo) "°° 

< max Pa; + Py 

xeR^ : gt{x)<gt{xo) 

< 2||dro|| +||do + Px|| +||db + Pv|| 

— I! "Moo II ^ lloo II " ^lloo 

< 2||aol| +2||ao + Px|| 

— II "lloo II " lloo 

<4(l + ||P|| )OPT + 2tln(2m). 

— \ 1 1 1 1 00/ ^ ^ 

II II 2 
From Lemma 3, we know that the Lipschitz constant of V5t i^ F* /^- Hence, Theorem 1 shows 

that 

9t{xk) < mm gt{x) + 

X 

< OPT + 

So, we have 

||ao + Pffc||^ < gt{xk) + tln{2m) 

II ii2 

2 P 
< 0PT + tln(2m)+ ' "°° (4 (l + ||P|| ) OPT + 2tln(2m))^ 

Using t = eOPT/21n(2m) and A; = 300||P||^ ln(2m)/e^ we have 

||ao + Pxfc||^<(l + e)OPT. 

Therefore, oq + Pa^fc is an (1 — e) approximate maximum flow. 

Now, we estimate the running time. In each step 5, we are required to compute {'\/g{xk)) ■ The 
gradient 

Vg{x) = P Vsmaxt((io + Px) 

can be computed in 0(T(P) + m) using the formula of the gradient of smax^, applications of P 
and P"^. Lemma 48 shows that the # operator can be computed in 0{m). D 

Lemma 5. In • , the # operator is given by the explicit formula 



X* 



e 



sign(a;e)||x||^ fore£E. 



Proof. Recall that 

x* = argmax^x, s) — -||s|| 
seR 2 
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It is easy to see that for all e G S, ||x*||oo = {x^) ■ In particular, we have 

(x#\ = sign(3;e)||f*||oo. 



The Fact 52 shows that ||x*| 



X , and the result follows. 



D 



3.4 Properties of soft max 

In this section, we define smaxt and discuss its properties. Formally, the regularized convex function 
can be found by smoothing technique using convex conjugate [21] [G, Sec 5.4]. For simplicity and 
completeness, we define it explicitly and prove its properties directly. Formally, we define 



Vx G R^, Vf G R+ : smaxt(x) = t In 



'Ee6i?exp(f)+exp(-f)' 



2m 



For notational simplicity, for all x where this vector is clear from context, we define c and s as 
follows 



vj ^ IP ^ dof ( -^eX , / •^e \ J ^ dof ( Xe 

Ve G £/ : Ce = exp I — 1 + exp I 1 and Se = exp I — 



exp 



t 



where the letters are chosen due to the very close resemblance to hyperbolic sine and hyperbolic 
cosine. 



Lemma 6. 



Vx G R" : vsmaxf (x) = -^wzl 

l-'c 



Vx G R" : V smaxi(x 

Proof. For alH G i? and x G R , we have 

— smax,(x) = — tin 



diag(c) 



SS 



Eeei^exp(f)+exp(-f) 



2m 



exp (f)- exp (-^) 
Eeei?exp(t)+exp(-f)- 



For all i,j £ E and x G R^, we have 

smax((x) 



dxidxj 



dxidxj 



tin 



'Eeei^exp(f)+exp(-3i;^)' 



2m 

d exp (^) -exp (-^) 

9J [Eee£;exp(t)+exp(-f) 

1 (l^c) li=j (Ci) - SiSj 



D 
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Lemma 7. The function smaxf is a convex continuously dijferentiable function and it has Lipschitz 
continuous gradient with Lipschitz constant 1/t and 

\\x\\ — tln(2m) < smaxi(x) < \\x\\ 

lllloo ^ ' — '•\' — lllloo 

for X £ R^. 

Proof. By the formulation of the Hessian, for all x,y € R^ , we have 



y (v smaxi(x)) y < ^^^^^ < ,,,t;^ < ^IklL" 



On the other side, for all x, y G R^ , we have by Sj < Cj and Cauchy Schwarz shows that 

ifsfy< ifcc^y< {l^c){if diag{c)y). 

and hence 

<if (v^smaxt(f)) y. 

Thus, the first part follows from Lemma 55. For the later part, we have 



I lloo - I 2m ~ 



( , I,., 
exp 



\ 



2m 



\ 



ln(2?7i). 
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4 Oblivious Routing 

In the previous sections we saw how a circulation projection matrix can be used to solve max flow. 
In the next few sections we show how to construct a circulation projection matrix in order to obtain 
an almost linear time algorithm for solving max flow. 

Our proof focuses on the notion of (linear) oblivious routings. Rather than constructing the cir- 
culation projection matrix directly, we show how the efficient construction of an oblivious routing 
algorithm with a good competitive ratio immediately allows us to produce a circulation projection 
matrix. 

In the remainder of this section we formally define oblivious routings and prove the relationship 
between oblivious routing and circulation projection matrices (Section 4.1), provide a high level 
overview of our recursive approach and state the main theorems we will prove in later sections (Sec- 
tion 4.2), and we prove the main theorem about our construction of circulation projection in almost 

linear time with norm 2^v ^^'^"""^^^'"^^ assuming the proofs in the later sections (Section 4.3). 
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4.1 From Oblivious Routing to Circulation Projection 

Here we provide definitions and prove basic properties of oblivious routings, tliat is fixed mapping 
from demands to flows that meet tlie demands. While non-linear algorithms could be considered, 
we restrict our attention to linear oblivious routing strategies and for notational convenience use 
oblivious routing to refer to the linear subclass for the remainder of the paper /^ 

Definition 8 (Oblivious Routing). An oblivious routing on graph G = {V,E) is a linear operator 
A G R^^^ such that for all demands x S K , B Ax = X- ^^ call Ax the routing of x by A. 

Oblivious routings get there name due to the fact that given an oblivious routing A and a set of 
demands Z? = {xi, • • • , Xfc} one can route all the demands by routing each demand individually by 
A and obtain a multicommodity flow that was oblivious to the relationship between the demands. 
We measure the competitive ratio'^ of such an oblivious routing strategy to be the worst relative 
congestion of such a routing to the minimal congestion routing of the demands. 

Definition 9 (Competitive Ratio). The competitive ratio of oblivious routing A S R^^^, denoted 
p(A), is given by 

p{A) '^ max ^»^g(i^x;;i) 

{X^} : Vi : Xz-Ll Opt({Xi}) 

At times it will be more convenient to analyze an oblivious routing as a linear algebraic object 
rather a combinatorial algorithm; towards this end we note that the competitive ratio of a linear 
oblivious routing strategy can be gleaned from the operator norm of a related matrix ( [ ] and [ ] ) . 
Below we state and prove a generalization of this result to weighted graphs that will be vital to 
relating A to P. 

Lemma 10. For any oblivious routing A we have /5(A) = ||U^^AB-^U|| 

Proof. For a set of demands D let D^o be the set of demands that result by taking the routing of 
every demand in D by opt(L') and splitting it up into demands on every edge corresponding to the 
flow sent by opt(L'). Now clearly opt{D) = opt{Doo) since routing D can be used to route Doo and 
vice versa and clearly cong(AL') < cong(AZ?oo) by the linearity of A (routing D^o simply doesn't 
reward A routing for cancellations). Therefore 

cong({Ai?}) cong(ADoo) \\J2eeE^e\lJ~^AXe\\\^ 

Pp(A) = max — — = max -— — - — = max 



D opt(D) Doo opt(Doo) xeRE ||U-ix| 

|U-iAB^| xll II lU-^AB^Ul x|| 



oo 



= max II — II = max 1, — i, 

xeR^ U'^x xeR-s \\x\\ 

II lloo II lloo 

D 

To make this lemma easily applicable in a variety of settings we make use of the following easy to 
prove lemma. 



*Note that the obUvous routing strategies considered in [: ] [15] [25] are all linear oblivious routing strategies. 

^ Again note that here and in the rest of the paper we focus our analysis on competitive ratio with respect to norm 
II • II . However, many of the results present are easily generalizable to other norms but outside the scope of this 
paper. 
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Lemma 11 (Operator Norm Bounds). For all A G [R"-x™ y^^ have that 

||A|| = II IAI II =|||A|l|| = max|||AFlj|L 

II lloo II I I lloo II I I lloo jg„ IM I Mil 

The previous two lemmas make the connection between obUvious routings and circulation projection 
matrices clear and below we prove this formally. 

Lemma 12 (From Oblivious Routing to Circulation Projection). For oblivious routing A € R 
the matrix P = I — AB is a circulation projection matrix such that UPU~^ < 1 + /o(A) . 

Proof. First we verify that im(P) is contained in cycle space 

Vx G R^ : B^Px = B^f - AB^x = 
Next we check that P is the identity on cycle space 

Vx G R^ s.t. B^x = : Pf = x-AB^x = x 
Finally we bound infinite norm of the scaled projection matrix 

IIUPU^HI = ||I + UAB^U"^|| <l + p(A) 

II lloo II lloo — ' r\ J 

D 

4.2 A Recursive Approach by Embeddings 

We construct an oblivious routing for a graph recursively. Given a complicated graph we show how 
to reduce computing an oblivious routing on this graph to computing an oblivious routing on a 
simpler graph. Key to these constructions will be the notion of an embedding which will allow us 
to reason about the competitive ratio of oblivious routing algorithms on graphs on the same vertex 
sets but different edge sets. 

Definition 13 (Embedding). Let G = {V,E,il) and G' = (y,E',f2') denote two undirected capaci- 
tated graphs on the same vertex set with incidence matrices B G R^^^ and B' G R^ ^^ respectively. 
An embedding from G to G' is a matrix M G R ^ such that B' M = B"^. 

In other words an embedding is a map from flows in one graph to flows in another graph that 
maintain the demands met by the flow. We can think of an embedding as a way of routing any 
flow in one graph in another graph without increasing the congestion of that too much or we can 
think of an embedding as a way of routing one graph in another graph with low congestion. The 
views are equivalent in the sense that they give equivalent definitions of quality of the embedding. 

Definition 14 (Embedding Congestion). Let M G R ^ be an embedding from G = {V,E,fI) to 
G' = {V,E',fl') and let U G R-^^-^ and V G R-^'^-^' denote the capacity matrices of G and G' 
respectively. The congestion of embedding M is given by 

\\\]'~^'M.x\\ 
cong(M) = max \ — n-"^ = ||U'"W|Ul|| . 



:relRS U~^X 



loo 



We say G embeds into G' with congestion a if there exists an embedding M from G to G' such 
that cong(M) < a 
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Embeddings potentially allow us to reduce computing an oblivious routing in a complicated graph 
to computing an oblivious routing in a simpler graph. More to the point, if we can embed a 
complicated graph in a simpler graph and we can efficiently embed the simple graph in the original 
graph, both with low congestion then we can just focus on constructing oblivious routings in the 
simpler graph. We prove this formally as follows. 

Lemma 15 (Embedding Lemma). Let G = {V,E,fl) and G' = {V,E',il') denote two undirected 
capacitated graphs on the same vertex sets, let M G R ^ denote an embedding from G into G' , let 
M' S [R^^^ denote an embeding from G' into G, and let A' E R^ ^^ denote an oblivious routing 
algorithm on G' . Then A = M'A' is an oblivious routing algorithm on G and 

p(A) < cong(M) • cong(M') • p{A') 

Proof. For all x G R we have by definition of embeddings and oblivious routings that 

B^ Ax = B'^M' A'x = B^f 

To bound p{A) we let U denote the capacity matrix of G, we let U' denote the capacity matrix of 
G' and using Lemma 10 we get 



loo 



loo 



p(A) = IIU^^AB^Ull = IIU^^M'A'B^Ul 

r \ ' \l Moo M I 

Using that M is an embedding and therefore B' M = B^ we get 

p(A) = IIU^^M'A'B'^MUll < IIU^^M'U'II • ||U'"^A'B''^U|| • IIU'^^MUl 

'^^'11 lloo — II llooll llooll I 

By the definition of competitive ratio and congestion we get the result. D 

Note how in this lemma we only use the embedding from G to G' to certify the quality of flows in 
G' , we do not actually need to apply this embedding in the reduction. 

Using this concept we construct oblivious routings via recursive application of two techniques. 
First, in Section 5 we show how to take an arbitrary graph G = {V,E) and approximate it by a 
sparse graph G' = {V, E') (i.e. one in which \E'\ = 0{\E\)) so that flows in G can be routed in G' 
with low congestion and such that there is an 0(1) embedding from G' to G that can be applied 
in Od-El) time. As a result of proving how to efficiently create such flow sparsifiers we prove the 
following theorem. 

Theorem 16 (Edge Reduction). Let G = {V,E, p) be an undirected capacitated graph with capacity 
ratio U < poly(|y|). In 0{\E\) time we can construct a graph G' on the same vertex set with at 
most 0{\V\) edges and capacity ratio at most U • poly(|l/|) such that given an oblivious routing A' 
on G' in 0{\E\) time we can construct an oblivious routing A on G such that 

T{A) = d{\E\+T{A')) and p(A) = 0(p(A')) 

Next, in Section 6 we show how to embed a graph into a collection of graphs consisting of trees 
plus extra edges. Then we will show how to embed these graphs into better structured graphs 
consisting of trees plus edges so that by simply removing degree 1 and degree 2 vertices we are left 
with graphs with fewer vertices. Formally, we prove the following. 
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Theorem 17 (Vertex Reduction). Let G = {V,E,fl) be an undirected capacitated graph with ca- 
pacity ratio U . For all t > in 0{t ■ \E\) time we can compute graphs Gi, . . . ,Gt each with at most 
0{- — ' °^^ ' ) vertices, at most \E\ edges, and capacity ratio at most \V\-U such that given oblivious 
routings Aj for each Gi, in 0{t • |-E|) tim,e we can compute an oblivious routing A on G such that 

r(A) = 0(t-|E|+yr(Ai)) and p{A) = 0{ma^p{A^)) 

In the next section we show that the careful apphcation of these two ideas along with a powerful 
primitive for routing on constant sized graphs suffices to produce an oblivious routing with the 
desired properties. 

4.3 Efficient Oblivious Routing Construction Proof 

Before, we prove that the previous theorems suffice to provide an efficient low congestion oblivious 
routing we provide one further lemma that will serve as the base case of our recursion. In particular 
we show that electric routing can be used to obtain a routing algorithm with constant comptetive 
ratio for constant sized graphs. 

Lemma 18 (Base Case). Let G = {V,E,fl) be an undirected capacitated graph and let us assign 
weights to edges so that W = U^. For C = B WB we have that A = WB£T is an oblivious 
routing on G with p{A) < ^/\E\ and T (C^) = d{\E\) 



Proof. To see that A is an oblivious routing strategy we note that for any demands x ^ 1^ we 
have B^A = CC'^A = A. To see bound p{A) we note that by Lemma 10 and standard norm 
inequalities we have 

||U-iWB/:tB^Uf|| ||UB£tB^Uf|L , „ + ^ „ 

p{A) = max ^ n-n — < max ^ ^-n— n — — = vM • UB^+B^U L 

xeR^ \\x\\ zeR^ ^^ 2; U 



The result follows from the fact in [29] that 11 = UB£"I'B-^U is an orthogonal projection, and 
therefore ||n||2 < 1 and the fact in [.31, 12, 14, 11] that T {€"<) = d{\E\). D 

Assuming Theorem 16 and Theorem 17, which we prove in the next two sections, we prove that 
low-congestion oblivious routings can be constructed efficiently. 

Theorem 19 (Recursive construction). Given an undirected capacitated graph G = {V,E,fl) with 
capacity ratio U . Assume U = poly(|y|). We can construct an oblivious routing algorithm A on G 
in time 

0(|S|2'^(Vi°gl^|i°gi°g(l^l))) 

such that 

r (A) = |^|20(V'i°sl^|i°gi°g(l^l)) and p{A) = 20(\/iog|v|iogiog(|y|))^ 
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Proof. Let c be the constant hidden in the exponent terms, including O(-) and poly(-) in Theorem 16 
and Theorem 17. Apply Theorem 16 to construct a sparse graph G^^\ then apply Theorem 17 



with t 



,Vlog|V|loglog{|V|)' 



to get t graphs G\ ,■ ■ ■ G\ such that each graphs have at most 



O {j\E\ log^^ \V\ log(C/)) vertices and at most U ■ \V\'^'^ capacity ratio. 

Repeat this process on each G- , it produces t^ graphs G\ ,■ ■ ■ , G^a . Keep doing this until all 
graphs Gi produced have 0(1) vertices. Let k be the highest level we go through in this process. 
Since at the A;-th level the number of vertices of each graph is at most O {jk\E\ log ^\V\ log (C/|yp'^'^)) 

vertices, we have k = [yj^^^^^^) ■ 

On each graph Gi, we use Theorem 18 to get an oblivious routing algorithm Aj for each Gi with 

r(A,) = 0(l) and p{Ai) = 0{l). 

Then, the Theorem 17 and 16 shows that we have an oblivious routing algorithm A for G with 

r (A) = 0{tk\E\ log'^'^dFl) log^''{U\V\^^'')) and p(A) = 0(log2^^ \V\ log'' {U\Vf ''')). 



The result follows from k = O 



TiS^) -d ' = [2V'°^l^l'-^-(l^l) 
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Using Theorem 19, Lemma 12 and Theorem 4, we have the following almost linear time max flow 
algorithm on undirected graph. 

Theorem 20. Given an undirected capacitated graph G = {V,E,fl) with capacity ratio U. Assume 
U = poly{\V\). There is an algorithm finds an (1 — e) approximate maximum flow in time 

,^,^o(^^\og\V\\oglog\V\] 
O I J2 



5 Flow Sparsifiers 

In this section, we define and construct efficient flow sparsifiers, that is, algorithms for sparsifying 
a graph and mapping flows in the sparse graph into the original with low congestion in time 0{m). 
Notice that our flow sparsifiers aim to reduce the number of edges, and are different from the fiow 
sparsifiers of Leighton and Moitra [ ] , which work in a different setting and reduce the number of 
vertices. 

Definition 21 (Flow Sparsifier). We call an algorithm an efficient {h,e,a)-flow sparsifier if on 
input graph G = {V,E,ii) with capacity ratio U it outputs a graph G' = {V,E',iJ,') with capacity 
ratio U' < U ■po\y{\V\) and an embedding M : R-^ — )• R^ of G' into G with the following properties: 

• Sparsity: G' is h-sparse, i.e. 

\E'\ < h 

• Cut Approximation: G' is an e-cut approximation of G, i.e. 

VScy : {l-e)fi{d{S))<fj'{d{S))<{l+e)n{d{S)) 
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• Flow Approximation: M has congestion at most a, i.e. 

cong(M) < Q. 

• Efficiency: Both the algorithm and any matrix-vector product involving M can be run in 
0{m) time. 

Flow sparsifiers allow us to solve a multi-commodity flow problem on a possibly dense graph G by 
converting G into a sparse graph G' and solving the flow problem on C, while suffering a loss of a 
factor of a in the congestion when mapping the solution back to G using M. 

Theorem 22. Consider a graph G = (V, E, fi) and let G' = {V, E' , /i') be given by an (/i, e, a)-flow 
sparsifier of G. Then, for any set of k demands D = XI1X2, ■ ■ ■ ,Xk between vertex pairs of V, we 
have: 

opta>{D)<^^^.opta{D). (2) 

Given the optimum flow {f^} over G' , we have 

conga{{Mf:}) < a • opt^,(Z)) < 9i^^ . opta{D). 

Proof. By the flow-cut gap theorem of Aumann and Rabani [.], we have that, for any set of k 
demands DonVwe have: 

„pt„(I>)>0(log.).m,nM. 

where D{d{S)) denotes the total amount of demand separated by the cut between S and S. As any 
cut S C y in G' has capacity fi'{d{S)) > (1 — e)/u'(5), we have: 

0(logfc) . fijdjS)) ^ Ojlogk) 
optG'iD) > -^^^ ■ mm ^^^^ > ^-^ . optaiD). 

The second part of the theorem follows as a consequence of the definition of the congestion of the 
embedding M. D 

Our flow sparsifiers should be compared with the cut-based decompositions of Racke[25]. Racke 
constructs a probability distribution over trees and gives explicit embeddings from G to this distri- 
bution and backwards, achieving a congestion of O(logn). However, this distribution over tree can 
include up to 0(n log n) trees and is not suitable for an almost linear time algorithm. Flow spar- 
sifiers answer these problems by embedding G into a single graph G' , which is larger than a tree, 
but still sparse. Moreover, they provide an explicit efficient embedding of G' into G. Interestingly, 
the embedding from G to G' is not necessary for our notion of flow sparsifier, and is replaced by 
the cut-approximation guarantee. This requirement, together with the application of the flow-cut 
gap [ ], lets us argue that the optimal congestion of a fc-commodity flow problem can change at 
most by a factor of 0(logA;) between G and G' . 
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5.0.1 Main Theorem on Flow Sparsifiers and Proof of Theorem 16 

The main goal of this section wih be to show the foUowing theorem: 

Theorem 23. For any constant e G (0, 1), there is an efficient (0{n),e, 0(1)) -flow sparsifier. 

Assuming Theorem 23, we can now prove Theorem 16, the main theorem necessary for edge reduc- 
tion in our construction of low-congestion projections. 

Proof of Theorem 16. We apply the flow sparsifier of Theorem 23 to G = {y,E,jT} and obtain 
output G' = {V,E',ll') with embedding M. By the definition of efficient flow sparsifler, we know 
that the capacity ratio U' of C is at most [/-polydyi), as required. Moreover, again by Theorem 23, 
G' has at most 0(|y|) edges. Given an oblivious routing A' on G' consider the oblivious routing 
A = MA'. By the deflnition of flow sparsifier, we have that 7~(M) = 0(|i?|). Hence 7" (A) = 

r(M) + r (A') = d{\E\) + T{A') . 

To complete the proof, we bound the competivity ratio /^(A). Using the same argument as in 
Lemma 10, we can write p(A) as 

p(A) = max ^°"g^»,t^» < max ^^^^oi^^o.) 



D OptG'(D) Doo Opt^l-Doo) 

where Dqo is the set of demands that result by taking the routing of every demand in D by opt(-D) 
and splitting it up into demands on every edge corresponding to the fiow sent by opt(Z)). Notice 
that Doo has at most \\E\\ demands that are routed between pairs of vertices in V. Then, because 
G' is an e-cut approximation to G, the flow-cut gap of Aumann and Rabani [ ] guarantees that 

OptcPoo) > -7^, rOptc'CZ^oo). 

C^lognj 
As a result, we obtain: 



/AN^^n ^ congG(AZ)oo) ^,, . congcCMA'D. 

p(A) < C(logn) • max — - — ^ = Cflogn • max — - 

^^ ^ - "^ ^ ' Doo optG/(£'oo) ^ ' D^ o^tG,{D, 



OOJ 

ool 



< O(logn) • cong(M) • max "^""^^^ ^,^^°° ^ < 0(p(A')). 

Doc OptQ,[Doo) 
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5.0.2 Techniques 

We will construct flow sparsifiers by taking as a starting point the construction of spectral spar- 
sifiers of Spielman and Teng [32]. Their construction achieves a sparsity of O (^) edges, while 
guaranteeing an e-spectral approximation. 

As the spectral approximation implies the cut approximation, the construction in [32] suflices to 
meet the flrst two conditions in Definition 21. Moreover, their algorithm also runs in time 0{m), 
meeting the fourth condition. Hence, to complete the proof of Theorem 23, we will modify the 
construction of Spielman and Teng to endow their spectral sparsifier G' with an embedding M 
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onto G of low congestion that can be both computed and invoked efficiently. The main tool we use 
in constructing M is the notion of electrical-flow routing and the fact that electrical-flow routing 
schemes achieve a low competitive ratio on near-expanders and subsets thereof [9, 15]. 

To exploit this fact and construct a flow sparsifier, we follow Spielman and Teng [ ] and partition 
the input graph into vertex sets, where each sets induces a near-expanders and most edges of 
the graph do not cross set boundaries. We then sparsify these induced subgraphs using standard 
sparsification techniques and iterate on the edges not in the subgraphs. As each iteration removes 
a constant fraction of the edges and by using standard sparsification techniques we get the sparsity 
and cut approximation properties for free. The flow embedding follows from the fact that the 
electrical-flow routing is competitive within these near-expander subgraphs. 

In the next two subsections, we introduce the necessary concept about electric flow routing and 
prove that electric flow routing achieves low competitive ratio over near-expanders (and subsets of 
near-expanders) . 

5.1 Subgraph Routing 

Given an oblivious routing strategy A we may be interested only in routing demands coming from 
a subset of edge F (1 E. In this setting, given a set of demands D routable in F we let opt^ (D) 
denote the minimal congestion achieved by any routing restricted to only sending flow on edges in 
F and we measure the F-competetive ratio of A by 

F/»Ndcf cong(AZ)) 
p (A) = max ^F,^- 

D routable in F O^V [D) 

As before, we can upper bound the F-competitive ratio p (A) by operator norms. 

Lemma 24. Let Ip € R denote the indicator vector for set F (i.e. l_F(e) = 1 if e £ F and 
li?(e) = 0) and let Ip = diag{lF). For any F C E we have 

p^(A) = II IU-^AB^UIfI II 

"^ ^ ' II I -'I lloo 

Proof. We use the same reasoning as the non-subgraph case. For a set of demands D = {di}, 
we consider Dp, the demands on the edges in F used by opt'^(Z)). Then, it is the case that 
opt^(-D) = opt^ (Dp) and we know that cost of obliviously routing Dp is greater than the cost of 
obliviously routing D. Therefore we have 

V_p|U-lAB^leXe| II 



p Z^eS-E I 

p = max II — — — I 

xeR^ : Is;\pX=0 U ^x\ 



•■E\F-' 



max 



EeeirlU-'AB^Uleyel 



yeR'^ : iE\Fy=o \\y\ 

II EeSi^lU-^AI 

max II — r 



II EeeelU'^AB^UIi^leyel || 
< max ||-;t| — (Having Xg 7^ for e G E\ F decreases the ratio.) 
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5.2 Electrical-Flow Routings 

In this section, we define the notion of electrical-flow routing and prove the results necessary to 
construct flow sparsifiers. Recall that R is the diagonal matrix of resistances and the Laplacian L 
is defined as B R~"^B. For the rest of this section, we assume that resistances are set as R = U~^. 

Definition 25. Consider a graph G = {V, E, /i) and set the edge resistances as re = — for all 



Me 



e € E. The oblivious electrical-flow routing strategy is the linear operator Ag defined as 

In words, the electrical-flow routing strategy is the routing scheme that, for each demand x sends 
the electrical flow with boundary condition x on the graph G with resistances R = U^^. 

For the electrical- flow routing strategy A^, the upper bound on the competitive ratio /o(A£-) in 
Lemma 10 can be rephrased in terms of the voltages induced on G by electrically routing an edge 
e € E. This interpretation appears in [9, 15]. 

Lemma 26. Let Ag be the electrical-flow routing strategy. For an edge e £ E, we let the voltage 
vector Ve € R^ be given by Ve = Oxe- For R = U~-^, we then have 

p(A,)<max Y. 



eeE ^-^ rah 



Proof. We have: 



\Veia) -Ve 



piAs) < WBC^B^K^W < max R ^B£l^B^le L = max V 

(a,b)eE "■" 
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The same reasoning can be extended to the subgraph-routing case to obtain the following lemma. 
Lemma 27. For F C E and R = U^-*^ we have 

'* ' -^,S. ^>' ■ 

(a,b)eF 

Proof. As before, we have: 

P^iAs) < ||B£tB^R-il^||^ (By Lemma 24) 

-max||lpR-iB£tB^le|L=max V l^-W " ^^(^)l 

eaR II 111 P(=R ^^ 



e£E " "1 e£E ^ ^ rab 

{a,b)&F "" 
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5.2.1 Bounding the Congestion 

In this section, we prove that we can bound the F-conipetitive ratio of the obhvious electrical- 
routing strategy as long as the edges F that the optimum flow is allowed to route over are contained 
within an induced expander G{U) = {U,E{U)) for some [/ C y. . Towards this we provide and 
prove the following lemma. This is a generalization of a similar lemma proved in [9] . 

Lemma 28. For weighted graph G = {V^ E, w) with integer weights and vertex subset U C. V the 

following holds: 

. II ^-ij^ri^W < 81og(vol(G(C/))) 

veeF . \\Le{U)^ tiL'Xe\\i< 



$(G(C/))2 



Proof. Let v = >C^Xe and recall that with this definition 



{a,b)eGiU) ^"-^ {a,b)eG{U) 

We define the following vertex subsets: 

Vx G R : S^={aeU\ v{a) < x} and 5| = {a e U \ v{a) > x} 

Since adding a multiple of the all-ones vector to v does not change the quantity of interest in 
Equation 3, we can assume without loss of generality that 



volcium) > - (vol(G(t/))) and vo1g(c/)(5'o-) > ^ (vol(G([7))) . 
For any vertex subset S '^ U, we denote the flow out of S and the weight out of S by 



f{S)= E We\v{a) - v{b)\, and w{S)= ^ ^e- 

e=(a,b)i^E(U)r\diS) eeE(U)r\diS) 

At this point, we deflne a collections of subsets {Cj G Sq}. For an increasing sequence of real 
numbers {cj}, we let 

Ci = 5| 
. We deflne the sequence {cj} inductively as follows: 

J (a) 



Co = , a = Cj-i + Aj_i , and Aj = 2 



w{Ci[ 



In words, the Cj+i equals the sum of Cj and an increase Aj which depends on how much the cut 
6{Ci) n E{U) was congested by the electrical flow. 

Now, li = w{dE[u){Ci-i) ~ (^E{u){Ci))i i-6- the weight of the edges in F{U) cut by Cj_i but not 
by d. We get 

vol(Q+i) < vol(Ci) - k 

< vol(Cj) — ^ (By choice of k and Aj) 

< vol(Cj) vol(Cj)<I>(G(L^)) (Deflnition of conductance) 
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Applying this inductively and using our assumption on voI(S'q-) we have that 

vol(a) < (l - l<^Giu?j vol(Co) < ^1 - 1^{G(.U)?I vol(G([/)) 

Since (j){G{U)) G (0, 1), for j + 1 = ^fcfV/)) ^^ have that vol(5j) < ^. Since vol(S'i) decreases 
monotonically with i, if we let r be the smallest value such that Cr+i = 0, we must have 



r < 



2 • log(vol(G([/))) 

HG{U)) 



Since v corresponds to a unit flow, we know that f{Ci) < 1 for all i. Moreover, by the definition of 
conductance we know that w{Ci) > ^{G{U)) ■ vol(Ci). Therefore, 



A,; < 



$(G([/))-voi(a: 



We can now bound the contribution of Cq to the volume of the linear embedding v. In the following, 
for a vertex a € V^, we let d{a) = J2e=ia b}eE(U) ""^e be the degree of a in E{U). 



Y^ d{a)v{a) = Y, 



aeC,^ 



i=0 

r 

i=0 



< 



E 

j=0 



Yl dia)via) 

a£Ci~Ci+i 



a£Ci-Ci+i \i=0 



(vol(Ci)-Vol(Q+i))- iY^i 

\i=0 

2r 



(By definition of Cj) 



Y^ol{Ci)Ai< 



i=0 



HGiU)) 



(Rearrangement and fact that vol(Cr-+i) = 0) 



By repeating the same argument on Sq, we get that J2aas- d{a)v{a) < ^wmy- Putting this all 
together yields 



|lE(t/)R-^B£t;ee| 



Y '^"■^ ■ i^( 

{a,b)(iG{U) 



a) — v( 



< Y d{a)v{a) < 

aeG{U) 



Ar 



HG{U)) 
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From this lemma and Lemma 27, the following is immediate: 

Lemma 29. Let F <^ E be contained within some vertex induced subgraph G{U), then for R = U^^ 
we have 

.F,^-l^rU / Eimt^-l^rU / 8 log(vol(G(t/))) 



(R-iB£t) < p^(^)(R-iB£t) < 



HG{U)Y 
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5.3 Construction and Analysis of Flow Sparsifiers 

In the remainder of this section we show how to produce an efficient 0(log'^)-flow sparsifier for some 
fixed constant c, proving Theorem 23. In this version of the paper, we make no attempt to optimize 
the value of c. For the rest of this section, we again assume that we choose the conductance of an 
edge to be the capacity an edge, i.e. U = W = R^^. 

As discussed before, our approach follows closely that of Spielman and Teng [:)2] to the construction 
of spectral sparsifiers. The first step of this line of attack is to reduce the problem to the unweighted 
case. 

Lemma 30. Given an efficient {h, e, a) -flow- sparsifier algorithm for unweighted graphs, it is pos- 
sible to construct an efficient {h ■ log U, e, a) -flow- sparsifier algorithm for weighted graphs G = 
iy, E, fi) with capacity ratio U obeying 

U = — = poly y . 

Proof. We write each edge in binary so that G = '^°^q '^^Gi for some unweighted graphs {Gi = 
(y, -Ej}jg[iog^]} where ||£'i|| < m for all i. We now apply the unweighted flow-sparsifier to each Gi 

in turn to obtain graphs {G'^}. We let G' = Ylii=o ^*^i ^^ *^^ weighted flow-sparsified graph. By 
the assumption on the unweighted flow-sparsifier, each G[ is /i-sparse, so that G' must have at most 
h ■ log U edges. Similarly, G' is an e-cut approximation of G, as each G[ is an e-cut approximation of 
the corresponding Gi. Letting M, be the embedding of G[ into Gj, we can consider the embedding 
M = Xlj^o 2*Mj of G' into G. As each Mi has congestion bounded by a, it must be the case that 
M also has congestion bounded by a. The time to run the weighted flow sparsifier and to invoke 
M is now 0{m) ■ logU = 0{m) by our assumption on U. D 

The next step is to construct a routine which flow-sparsifies a constant fraction of the edges of E. 
This routine will then be applied iteratively to produce the final flow-sparsifler. 

Lemma 31. On input an unweighted graph G = iV^E), there is an algorithm that runs in 
0{m) and computes a partition of E into {F,F), an edge set F' C F with weight vector w~pi E 
R , support (if i?') = F\ and an embedding H : R — )■ R with the following properties: 



1. F contains most of the volume of G, i.e. 



|F|>t3; 

11-2' 



2. F' contains only 0{n) edges, i.e. \F'\< 0{n). 

3. The weights wpi are bounded 

Ve e F' , , , , < WF'ie) < n. 
poly(n) 

^. The graph H' = (y,F',wp') is an e-cut approximation to H = (y,F), i.e. 

(1 - e)\6H{S)\ < WF'{S{S)) < (1 + e)\6H{S)\. 
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5. The embedding H from H = {V,F',wpi) to G has hounded congestion 

cong(H) = 0(1). 

and can he applied in time 0{m). 

Given Lemma 30 and Lemma 31, it is straightforward to complete the proof of Theorem 23. 

Proof. Using Lemma 30, we reduce the objective of Theorem 23 to running a (0(n),e, 0(l))-flow 
sparsifier on logC/ unweighted graphs, where we use the fact that U < poly(n). To construct this 
unweighted flow sparsifier, we apply Lemma 31 iteratively as foUows. Starting with the instance 
unweighted graph Gi = {V,Ei), we run the algorithm of Lemma 31 on the current graph Gt = 
{V, Et) to produce the sets Ft and F/, the weight vector wp' and the embedding Ht : R^* — )■ R^. 

To proceed to the next iteration, we then define ii^i+i = Et\Ft and move on to Gt+i. 

By Lemma 31, at every iteration t, \Ft\ > ^ ■ \Et\, so that |-Ei+i| < 5 " 1^*1- This shows that there 
can be at most T < log(|£'i|) = O(logn) iterations. 

After the last iteration T, we have effectively partitioned Ei into disjoint subsets {Ft}t£[T]j where 
each Ft is well- approximated but the weighted edgeset -F/. We then output the weighted graph 
G' = {V,E' = nf^iFl,w' = Ylt=i''^F')-: which is the sum of the disjoint weighted edges sets 
{^t}t£[T]- We also output the embedding M : R^ — ;■ R^ from G' to G, defined as the direct sum 

T 

M = 0Ht. 

t=i 

In words, M maps an edge e' G E' by finding t for which e' £ Fl and applying the corresponding 
Ht. 

We are now ready to prove that this algorithm with output G' and M is an efficient {0{n),£, 0{n))- 
flow sparsifier. To bound the capacity ratio U' of G' , we notice that 

maXegi?' Wp' (e) 

U < max ^-— < poly(n), 

t mm^^piwp^[e) 

where we used the fact that the sets -F/ are disjoint and the guarantee on the range of Wp'. 

Next, we bound the sparsity of G' . By Lemma 31, Fl contains at most 0{n) edges. As a result, we 
get the required bound 

T 
\E'\ = Y^ \Ft\ < d{Tn) = d{n). 
t=i 
For the cut approximation, we consider any S" C y. By the cut guarantee of Lemma 31, we have 
that, for aU t G [T], 

(1 - e)\5{S) n Ft\ < wp>{5{S) n F't) < (1 + e)\5{S) n Ft\. 
Summing over all t, as £" = |J F'^ and E = \^ Ft, we obtain the required approximation 

{l-e)\5G{S)\<w\5{S))<{l + e)\5G{S)\. 
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The congestion of M can be bounded as follows 

T 



cong(M) < ^cong(Ht) = 0{T) = 0(1). 



To conclude the proof, we address the efficiency of the flow sparsifier. The algorithm applies the 
routine of Lemma 31 for T = 0(1) times and hence runs in time 0{m), as required. Invoking the 
embedding M requires invoking each of the T embeddings Ht. This takes time 0{Tm) = 0{m). 

D 

5.3.1 Flow Sparsification of Unweighted Graphs: Proof of Lemma 31 

In this subsection, we prove Lemma 31. 

Our starting point is the following decomposition statement, which shows that we can form a 
partition of an unweighted graph where most edges do not cross the boundaries and the subgraphs 
induced within each set of this partition are near-expanders. The following lemma is implicit in 
Spielman and Teng's local clustering approach to spectral sparsification [.V2] . 

Lemma 32 (Decomposition Lemma). For an unweighted graph G = {V,E), in 0{m)-time we 
can produce a partition Vi,...,Vk of V and a collection of sets Si, . . . ,Sk C V with the following 
properties: 

• For all i, Si is contained in Vi. 

• For all i, there exists a set Ti with ^j C Tj C 1^, such that 

1 



Jog n 
• At least half of the edges are found within the sets {Si}, i.e. 

Y, \E{S.)\ = E li« = {«' b}:aGS.,bG S.}\ > ^\E\. 

To design an algorithm satisfying the requirements of Lemma 31, we start by appling the Decom- 
position Lemma to our unweighted input graph G = {V, E) to obtain the partition {Vi}jg[fc] and 

the sets {Si}i(z[k]- We let Gi = {Si, E{Si)). To reduce the number of edges, while preseving cuts, 
we apply a spectral sparsification algorithm to each Gi. Concretely, by applying the spectral spar- 
sification by effective resistances of Spielman and Srivastava [29] to each Gi, we obtain weighted 
graphs G'i = {Si,E[ C E{Si),w'i) in time Eti 0(|S(5i)|) < d{\E\) with \E[\ < d{\Si\) and the 
property that cuts are preserved*' for all i: 



®The spectral sparsification result actually yields the stronger spectral approximation guarantee, but for our 



V5C5, , {l-e)-5GAS)<w'MS))<il + e)-6GAS). 

fication res 
purposes the cut guarantee suffices. 
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Moreover, the spectral sparsification of ["'^J constructs the weights {'w'^ie)}e&E'. such that 

yee E- , — — — < — ,, , < w'iie) < \Si\ < n. 
poly(n) poly(IS'i) 

To coinplete the description of the algorithm, we output the partition {F, F) of E, where 

k 

f'^[Je{s,). 

i=\ 

We also output the set of weighted sparsified edges F' . 

k 



F' = (J E'^. 




i=l 

The weight wp'ie) of edge e G F' is given by finding i such that e G E'- and setting wp'^e) = w[{e). 

We now depart from Spielman and Teng's construction by endowing our F' with an embedding 
onto G. The embedding H : R — t- R of the graph H = (V,F',wf') to G is constructed by using 
the oblivious electrical-flow routing of E{Si) into G{Vi). More specifically, as the sets {Vi} partition 
V, the embedding H can be expressed as the following direct sum over the orthogonal subspaces 

/ k \ 

where 'i-(E{Vi).E') is the identity mapping of the edges E'- C E{Vi) of F' over Vi to the edges E{Vi) 
of Vi in G. Notice that there is no dependence on the resistances over G as G is unweighted. 

This complete the description of the algorithm. We are now ready to give the proof of Lemma 31. 

Proof of Lemma 31 . The algorithm described above performs a decomposition of the input graph 
G = {V^E) in time 0{m) by the Decomposition Lemma. By the result of Spielman and Srivas- 
tava [29], each Gi is sparsified in time Oi\E{Si)\). Hence, the sparsification step requires time 0{m) 
as well. This shows that the algorithm runs in 0(m)-time, as required. 

By the Decomposition Lemma, we know that \F\ = X]i=i l-^('5'j)l — 2^ which satisfies the re- 
quirement of the Lemma. Moreover, by the spectral sparsification result, we know that \F'\ = 
'Ylii=i \^'i\ — Yli=i 0{\Si\) < 0{n), as required. We also saw that by construction the weights wp' 
are bounded: 

\/e € F' , — , , , < wp'(e) < n. 
poly(n) 

To obtain the cut-approximation guarantee, we use the fact that for every i, by spectral sparsifica- 
tion, 

V5C5, , {l-e)-SG,{S)<w'MS))<{l + e)-6G„{S). 

We have H' = {V, F', wp') and H = {V, F). Consider now T C y and apply the previous bound to 
T n Si for all i. Because F' (1 F = U^^iE{Si), we have that summing over the k bounds yields 

yrcv , (1 - e)\SH{T)\ < wp,{6{T)) < (1 + e)\5H{T)\, 
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which is the desired cut-approximaton guarantee. 

Finally, we are left to prove that the embedding H from H' = {V,F',wp') to G = {V,E) has low 
congestion and can be applied efficiently. By definition of congestion, 



Hx 
cong(H) = max °° 



?e^^' IIV^IL 



|H|Uf'1f' 



^E{V,)^GiV,)^E(V,)kE{V,),E:^ 



i=l 



Vp'tF' 



Decomposing R into the subspaces {R^ '^} and R into the subspaces {R »} we have: 

^E'^E' 



cong(H) < max 

ie [k] 



^E{V,)^G{V,)^EiV,)^iE(Vi),E'J 



For each is [k], consider now the set of demands Di over Vi, Di = {Xe lees') given by the edges 
of E'- with their capacities w'^. That is, Xe S K^' is the demand corresponding to edge e G E'^ with 
weight w'^{e). Consider also the electrical routing A^- j = B£;(y.)£j-,.^x over G{Vi). Then: 

cong(H) < maxcong(A£- iDi) 

ie[fc] 

Notice that, by construction, Di is routable in G[ = {Si,E'-,w'j) and optQ/(Dj) = 1. But, by our 
use of spectral sparsifiers in the construction, G'^ is an e-cut approximation of Gj. Hence, by the 
flow-cut gap of Aumann and Rabani [ !], we have: 

optcSDi) < log(lAl) • optG.(A) < 0(1). 

When we route Di oblivious in GiVi), we can consider the ii'(5j)-competitive ratio p^^^'-'{A£ j) of 
the electrical routing A^ j = ^E{Vi)^G(V-)^ ^^ ^* ^^ routable in E{Si), because E'^ C E{Si). We have 



MS^)| 



uE{S,), 



MS^)| 



cong(H) < maxp^[^^J(A^^,) • opt^[^;J(A) = maxp^[^^J(Af ^,) • optG,(A), 



Finally, putting these bounds together, we have: 



E{S^), 



cong(H) < max/3 ; (A^^i) • optG,(A) < 0(1) • maxp ; (Af ^i). 
But, by the Decomposition Lemma, there exists Xi with 5j C Tj C V^ such that 



log n 



E(s.)f^^ ,^nl iogYol{G{Ti))\ ^ ^^^^_ 



Then, by Lemma 29, we have that: 

This concludes the proof that cong(H) < 0(1). To complete the proof of the Lemma, we just 
notice that H can be invoked in time 0{m). A call of H involves solving A;-electrical-problems, one 
for each G{Vi). This can be done in time X]i=i 0(|-E'(^)|) < 0{m), using any of the nearly-linear 
Laplacian system solvers available, such as [i 1]. D 
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6 Removing Vertices in Oblivious Routing Construction 

In this section we show how to reduce computing an efficient obhvious routing on a graph G = (V^ E) 
to computing an obhvious routing for t graphs with O(M-) vertices and at most \E\ edges. Formally 
we show 

Theorem 33 (Node Reduction (Restatement)). Let G = {V,E,fl) be an undirected capacitated 
graph with capacity ratio U . For all t > Q in 0{t ■ \E\) time we can compute graphs Gi, . . . ,Gt 
each with at most 0{- — ' ^ ' ) vertices, at most \E\ edges, and capacity ratio at most \V\ ■ U, such 
that given oblivious routings Aj for each Gi, in 0{t ■ \E\) time we can compute an oblivious routing 
A £ R^""^ on G such that 



T{A) = olt-\E\ + "^T{Ai)\ and p(A) = O (maxp(Ai) j 



We break this proof into several parts. First we show how to embed G into a collection of t graphs 
consisting of trees minus some edges which we call patrial tree embeddings (Section 6.1). Then we 
show how to embed a partial tree embedding in an "almost j'-tree" [19], that is a graph consisting 
of a tree and a subgraph on at most j vertices, for j = 2t (Section 6.2). Finally, we show how to 
reduce oblivious routing on an almost j-tree to oblivious routing on a graph with at most 0{j) 
vertices by removing degree-1 and degree-2 vertices (Section 6.3). Finally, in Section 6.4 we put 
this all together to prove Theorem 17. 

We remark that much of the ideas in the section were either highly influenced from [1 ] or are 
direct restatements of theorems from [19] adapted to our setting. We encourage the reader to look 
over that paper for further details regarding the techniques used in this section. 

6.1 From Graphs to Partial Tree Embeddings 

To prove Theorem 17, we make heavy use of spanning trees and various properties of them. In 
particular, we use the facts that for every pair of vertices there is a unique tree path connecting 
them, that every edge in the tree induces a cut in the graph, and that we can embed a graph in 
a tree by simply routing ever edge over its tree path and that the congestion of this embedding 
will be determined by the load the edges place on tree edges. We define these quantities formally 
below. 

Definition 34 (Tree Path). For undirected graph G = {V, E), spanning tree T, and all a,b G V we 
let Pafi ^ E denote the unique path from a to b using only edges in T and we let pa,b £ ^ denote 
the vector representation of this path corresponding to the unique vector sending one one unit from 
a to b that is nonzero only on T (i.e. ^ pa^ = Xa,b CiiT-d \/e G E\T we have Pa,b{G) = 0) 

Definition 35 (Tree Cuts). For undirected G = {V, E) and spanning tree T Q E the edges cut by 
e, dT{F), and the edges cut by F , 9r(e), are given by 

dT{e) = {e' eE\e' e Pe} and driF) = UeeFd{e) 

Definition 36 (Tree Load). For undirected capacitated G = {V,E,fl) and spanning tree T <Z E the 
load on edge e £ E by T, cong2^(e) is given by loadT(e) = J2e'eE\e&P , /^e' 
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While these properties do highlight the fact that we could just embed our graph into a collection of 
trees to simplify the structure of our graph, this approach suffers from a high computational cost 
[25]. Instead we show that we can embed parts of the graph onto collections of trees at a lower 
computational cost but higher complexity. In particular we will consider what we call partial tree 
embeddings. 

Definition 37 (Partial Tree Embedding ^). For undirected capacititated graph G = {V,E,fl) 
spanning T and spanning tree subset F <Z T we define the partial tree embedding graph H = 
H{G, T, F) = {V, E' , fl') to a he a graph on the same vertex set where E' = T U 5r(-F) and 



I /2(e) otherwise 



Furthermore, we let M/^ G IR ^ denote the embedding from G to H{G,T, F) where edges not cut 
by F are routed over the tree and other edges are mapped to themselves. 

yeeE : MM = i'' ' ^ '^^""^ 
I Ig otherwise 

and we let M^ G R^^^ denote the embeding from H to G that simply maps edges in H to their 
corresponding edges in G, i.e. Ve G E' , M^(e) = Ig. 

Note that by definition cong(Mj:/) < 1, i.e. a graph embeds into its partial tree embedding with no 
congestion. However, to get embedding guarantees in the other direction more work is required. For 
this purpose we use a lemma from Madry [19] saying that we can construct a convex combination 
or a distribution of partial tree embeddings we can get such a guarantee. 

Lemma 38 (Probabilistic Partial Tree Embedding ^). For any undirected capacitated graph G = 
{V,E,fl) and t > £ J. in 0{t ■ m) time we can find a collection of partial tree embeddings 
Hi = -ff (G, Ti, Ffc), . . . , Ht = H{G, T^, Fk) and coefficients Aj > with Y2i Aj = 1 such that Vi G [t] 
we have \Fi\ = oCZlW) and such that Vi G [t] we have \Fi\ = QCZlW) and such that J2^ ^iM'^, 
embeds G' = Y2i ^iGi into G with congestion 0(1) 

Using this lemma, we can prove that we can reduce constructing an oblivious routing for a graph 
to constructing oblivious routings on several partial tree embeddings. 

Lemma 39. Let the Hi be graphs produced byLemma 38 and for all i let Aj be an oblivious 
routing algorithm for H^. It follows that A = ^^ AjM^ Aj is an oblivious routing on G with 
p(A) < 0(maxi p{Ai) log n) and T (A) = 0(E, T (A^)) 

Proof. The proof is similar to the proof of Lemma 15. For all i let Uj denote the capacity matrix 
of graph Gi. Then using Lemma 10 we get 



p(A) = jIU-^AB^U] 



^This is a restatement of the -ff (T, F) graphs in [19 
^This in an adaptation of Corollary 5.6 in [ ] 



^A.U-^M'^^AiB^U 



i=\ 
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Using that M//. is an embedding and therefore B|^,M//- = B we get 



P(A) 



J;a,u-im'^a.b|^^Mh,u 



i=l 



< max 



J]a,u-im',u, 



1=1 



• p(Aj) • cong(MHj 



The result follows from ^^ XiM.'^, being an embedding of congestion of at most 0(1) and cong(M//j,) < 
1. ' D 



6.2 From Partial Tree Embeddings To Almost-j-trees 

Here we show how to reduce constructing an oblivious routing for a partial tree embedding to 
constructing an oblivious routing for what Madry [19] calls an "almost j'-tree," the union of a tree 
plus a subgraph on at most j vertices. First we define such objects and then we prove the reduction. 

Definition 40 (Almost j'-tree). We call a graph G = {V,E) an almost j'-tree if there is a spanning 
tree T (1 E such that the endpoints of E\T include at most j vertices. 

Lemma 41. For undirected capacitated G = {V,E,fl) and partial tree embedding H = H{G,T,F) 
in 0{\E\) time we can construct an almost 2 • \F\-tree G' = {V,E',p') with \E'\ < \E\ and an 
embedding M' from G' to H such that H is embeddable into G' with congestion 2, cong(M') = 2, 
andT{M') = d{\E\). 

Proof. For every e = (a, b) G E, we let v^{e) G V denote the first vertex on tree path P(^a,b) incident 
to F and we let v'^{e) G V denote the last vertex incident to F on tree path Pia,b)- Note that for 
every e = {a,b) £ T we have that {v^ (e) , v'^ (e)) = e. 

We define G' = {V,E',fl') to simply be the graph that consists of all these {v^ (e) , v'^ (e)) pairs 

E' = {{a,b) \3ee E such that (a, 6) = {v^ (e) , v^ (e))} 
and we define the weights to simply be the sums 

ye'GE' : ^'(e') = Yl ^(^) 



Now to embed H in G' we define M by 

Ve = {a,b) € E : Ml^ 
and to embed G' in H we define M' by 



e£E I e={v^(e'),v'2(e')) 



Pa,v^{e) + 1(1)1 (e),i)2(e)) + Pv^{e),i 



Pie) 



^jTi^ [Pv^{e),a + Ha,b)+Pb,v^(e)\ 



ye eE : M'le/ = J^ 

e={a,b)£E I e'=(i)i{e),»;2(e)) 

In other words we route edges in H along the tree until we encounter nodes in F and then we 
route them along added edges and we simply route the other way for the reverse embedding. By 
construction clearly the congestion of the embedding in either direction is 2. 

To bound the running time, we note that by having every edge e in H maintain its v^{e) and u^(e) 
information, having every edge e' in E' maintain the set {e G E\e' = (v^ (e) , v'^ (e))} in a list, and 
using link cut trees [ - ] or the static tree structure in [ ■ ] to update information along tree paths 
we can obtain the desired value of T(M'). D 
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6.3 From Almost-J Trees to Less Vertices 

Here we show that by "greedy eUmination" [31] [12] [14], i.e. removing all degree 1 and degree 
2 vertices in 0{m) time we can reduce obhvious routing in almost-j'-trees to obHvious routing in 
graphs with 0{j) vertices while only losing 0(1) in the competitive ratio. Again, we remark that 
the lemmas in this section are derived heavily from [19] but repeated for completeness and to prove 
additional properties that we will need for our purposes. 

We start by showing that an almost-j-tree with no degree 1 or degree 2 vertices has at most 0{j) 
vertices. 

Lemma 42. For any almost j-tree G = iV^E) with no degree 1 or degree 2 vertices, we have 
\V\ < 3j - 2. 

Proof. Since G is an almost j-tiee, there is some J <ZV with \J\ < j such that the removal of all 
edges with both endpoints in J creates a forest. Now, since K = V — J is incident only to forest 
edges clearly the sum of the degrees of the vertices in K is at most 2(|y| — 1) (otherwise there would 
be a cycle). However, since the minimum degree in G is 3, clearly this sum is at least 3{\V\ — j). 
Combining yields that 3|y|-3j<2|y|-2. D 

Next, we show how to remove degree one vertices efficiently. 

Lemma 43 (Removing Degree One Vertices). Let G = (y,E,il) be an unweighted capacitated 
graph, let a € V be a degree 1 vertex, let e = {a,b) G E be the single edge incident to a, and let 
G' = {V' , E' , p,') be the graph that results from simply removing e and a, i.e. V = V \ {a} and 
E' = E \ {e}. Given a £ V and an oblivious routing algorithm A' in G' in 0(1) tim,e we can 
construct an oblivious routing algorithm A in G such that 

T (A) = OiT (A') + 1) , and p{A) = p{A') 

Proof. For any demand vector x, the only way to route demand at a in G is over e. Therefore, if 
B/ = X then /(e) = x- Therefore, to get an oblivious routing algorithm on G, we can simply send 
demand at a over edge e, modify the demand at b accordingly, and then run the oblivious routing 
algorithm on G' on the remaining vertices. The routing algorithm we get is the following 

Since all routing algorithms send this flow on e we get that p{A) = p{A') and since the above 
operators not counting A have only 0(1) entries that are not the identity we can clearly implement 
the operations in the desired running time. D 

Using the above lemma we show how to remove all degree 1 and 2 vertices in 0[m) time while only 
increasing the congestion by 0(1). 

Lemma 44 (Greedy Elimination). Let G = {V,E,p) be an unweighted capacitated graph and let 
G' = {V , E' , p,') be the graph the results from iteratively removing vertices of degree 1 and replacing 
degree 2 vertices with an edge connecting its neighbors of the minimum capacity of its adjacent 
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edges. We can construct G' in 0{m) time and given an oblivious routing algorithm A' in G' in 
0(1) time we can construct an oblivious routing algorithm A in G such that ^ 

r (A) = 0{T (A') + 1^1) , , and p{A) < 4 • p(A') 

Proof. First we repeatedly apply Lemma 43 repeatedly to in reduce to the case that there are no 
degree 1 vertices. By simply array of the degrees of every vertex and a list of degree 1 vertices this 
can be done in 0{m) time. We denote the result of these operations by graph K. 

Next, we repeatedly find degree two vertices that have not been explored and explore this vertices 
neighbors to get a path of vertices, ai, 02, . . . , a^ G F for A; > 3 such that each vertex a2, . . . , a^-i is 
of degree two. We then compute j = argminjgr;j_^i /r(ai,ai+i), remove edge (aj,aj+i) and add an 
edge (ai,afc) of capacity fl{aj,aj^i). We denote the result of doing this for all degree two vertices 
by K' and note that again by careful implementation this can be performed in 0{m) time. 

Note that clearly K is embeddable in K' with congestion 2 just by routing every edge over itself 
except the removed edges which we route by the path plus the added edges. Furthermore, K' is 
embeddable in K with congestion 2 again by routing every edge on itself except for the edges which 
we added which we route back over the paths they came from. Furthermore, we note that clearly 
this embedding and the transpose of this operator is computable in 0{m) time. 

Finally, by again repeatedly applying Lemma 43 to K' until there are no degree 1 vertices we get 
a graph G' that has no degree one or degree two vertices (since nothing decreased the degree of 
vertices with degree more than two). Furthermore, by Lemma 43 and by Lemma 15 we see that 
we can compose these operators to compute A with the desired properties. D 

6.4 Putting It All Together 

Here we put together the previous components to prove the main theorem of this section. 

Node Reduction Theorem 17. Using Lemma 38 we can construct G' = J2i=i ^iGi and embeddings 
Ml, . . . , Mi from Gi to G. Next we can apply Lemma 41 to each Gi to get almost-j-trees G[, . . . ,G'^ 
and embeddings M'j^, . . . ,Mi from G[ to Gi. Furthermore, using Lemma 44 we can construction 
graphs G'l, . . . , G'l with the desired properties (the congestion ratio property follows from the fact 
that we only add capacities during these reductions) 

Now given oblivious routing algorithms A", . . . , A" on the G'l and again by Lemma 44 we could get 
oblivious routing algorithms A'^, . . . , A^ on the G[ with constant times more congestion. Finally, by 
the guarantees of Lemma 15 we have that A = X]i=i AMjM^A^ is an oblivious routing algorithm 
that satisfies the requirements. D 



Note that the constant of 4 below is improved to 3 in [19]. 



36 



7 Nonlinear Projection and Maximum Concurrent Flow 

7.1 Gradient Descent Method for Nonlinear Projection Problem 

In this section, we strengthen and generahze the MaxFlow algorithm to a more general setting. 
We believe this algorithm may be of independent interest as it includes maximum concurrent flow 
problem, the compressive sensing problem, etc. For some norms, e.g. || • |L as typically of interest 
compressive sensing, the Nesterov algorithm [21] can be used to replace gradient descent. However, 
this kind of accelerated method is not known in the general norm settings as good proxy function 
may not exist at all. Even worse, in the non-smooth regime, the minimization problem on the 
II • II with p > 2 can be proven to be difficult [- ]. For these reasons we focus here on the gradient 
descent method which is always applicable. 

Given a norm || • ||, we wish to solve the what we call the non-linear projection problem 

min \\x — v\\ 

xGL 

where y is an given point and L is a linear subspace. We assume the following: 
Assumption 45. 

1. There are a family of convex differentiahle functions ft such that for all x € L, we have 

\\x\\ < ft{x) < \\x\\ + Kt 
and the Lipschitz constant of Vft is j . 

2. There is a projection matrix P onto the suh space L. 

In other words we assume that there is a family of regularized objective functions ft and a projection 
matrix P, which we can think of as an approximation algorithm of this projection problem. 

Now, let X* be a minimizer of min^g^. ||^~ y||- Since x* S L, we have Px* = x* and hence 

1 1 T* ~* ~*l I ^^ 1 1 ~* ~** II I II — ** "n — *| I 

||Py — y|| < ||y ~ 2; || + ||x — Py|| 

<l I ~* ~** II I II T* ~** T* ~*l I 

_ ||y — X II + ||Px — Pyjl 

< fl + ||P||) min||x -y||. (4) 

- V II 117 .g^ II y\\ V ) 

Therefore, the approximation ratio of P is 1 + ||P|| and we see that our problem is to show that we 
can solve nonlinear projection using a decent linear projection matrix. Our algorithm for solving 
this problem is below. 
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NonlinearProjection 



Input: a point y and OPT = min^gj, x — y 



1. Let yo = (I — P) y and aJo = 0. 



2. For j = 0, 



until 2-^ P < 



3. If 2--'' P > 1, then let t^ 



2-0+2) p OPT 



K 



and kj = 3200 



\^\?K. 



If 2-J||P|| < 1, then let i,- 



^2J^ and fcj 



800 



K 



5. Let 5rj(f) = ftj (P:? - i/J-) and xq = 0. 



6. For i = 0, • 



, K j 



7. 



^i+l — Xi 



^fXW 



-{V9j{xi)) 



8. Let yj+i = fj 



P^fcr 



9. Output y-yi2st- 



Note that this algorithm and its proof are quite similar to Theorem 4 but modified to scale pa- 
rameters over an outer loop. By changing the parameter t we can decrease the dependence of the 
initial error. ^^ 

Theorem 46. Assume the conditions in Assumption 45 are satisfied. Let T be the time needed 
to compute Px and P x and x"^. Then, NonlinearProjection outputs a vector x with \\x\\ < 
(1 + e) min^g^. 11^ ~ 2/11 ^^'^ ^^^ algorithm takes time 



O 



|P||^/^(r + m)(^+log||P| 

1 gZ Ml 



Proof. We prove by induction on j that when 2 '-^ ^•'||P|| > 1 we have ||yjH<(l + 2 ■'||P||) OPT. 

For the base case (j = 0), (46) shows that ||y5|| < (l + ||P||) OPT. 

For the inductive case we assume that the assertion holds for some j. We start by bounding the 
corresponding R in Theorem 1 for gj, which we denote Rj. Note that 

gji^o) = ft,{-yl) < hlW + Kt, < (1 + 2--''||p||^) oPT + i^i,. 

Hence, the condition that gj{x) < gjixo) implies that 

\\Px-yA\ < (l + 2"-''||P|| ) OFT + Ktj. 
Take any y G X* , let c = x — Px + y, and note that Pc = Py and therefore c G X* . Using these 



"This is an idea that has been apphed previously to solve linear programming problems [23]. 
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facts, we can bound Rj as follows 

Rj = max 

x&RE ; gj{x)<gj{xo) [x''(^X'' 

< max 1 1 X — cj I 

xeR^ : gjix)<gj{xo) 

< max IIPx — Pyll 

zeRS : gjix)<gj{xo) 

< max ll-^-^ll + l|-^y|| 

leR^ : gj{S)<gj(:ro) 

- 2||2/o|| + ||P^-%-|| + ||Py-y5i| 

<2||fo|| +2||Pf-y;-|| 

<4(1 + 2"^'||P|| )0PT + 2i^t,-. 

— \ MM 00/ -^ 

II 1 1 2 

Similar to Leinina 3, the Lipschitz constant Lj oi Qj is P /tj. Hence, Theorem 1 shows that 

2 • Lj • R- 

gj{xk^) < mmgj(x)+ ^ _^ ^ 

< min Px — 7/1 H ; -^ + Ktj 

X " ^" kj + A ■> 

So, we have 

1 1 P^fcj -Vj\\ ^ ftj (Pa;fcj - Vj ) 



II ii2 

2 P 
< OPT + Kt^ + '' " ^, (4 (1 + 2-^'||P||) OPT + 2KtA . 

When 2~-'||P|| > 1, we have 

2-(i+2)||p||OPT ,, ,,2 

ti = ^— tJ and ki = 3200 Pi^ 

K J M II 

and hence 

||y,+i|| = \\Pxk^-yj\\ < (l + 2-^'-i||P||)0PT. 

When 2~-'||P|| < 1, we have 

eOPT , , 800||p||^e: 

*i = ^]^ and kj = -^ 

and hence 

||y,-+i|| = \\Pxk^-y-j\\ < (l + e)OPT. 

Since yiast is y plus some vectors in L, y - yi2st e L and \\y - yi2st - y\\ = \\yi2st\\ < (1 + e) OPT. 

D 
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7.2 Maximum Concurrent Flow 

For an arbitrary set of demands Xi S K with XltieV Xi{'^) = for i = 1,- ■ ■ ,k, we wish to solve 
the following maximum concurrent flow problem 

k 

max a subject to B /j = axi and IIU^"'^ > |/j||| < 1. 
Similar to Section 3.2, it is equivalent to the problem 

k 

min II y^ IcTj + (Qf)-| 11 

where Q is a projection matrix onto the subspace {B Uxi = 0}, the output maximum concurrent 
flow is 

k 

Mx) = U(a, + (Qx),)/|| Y, \^^ + (Q^)J IL ' 

and UcTj is any flow such that B-^Udi = Xi- ^^ order to apply NonlinearProjection, we need to 
find a regularized norm and a good projection matrix. Let us define the norm 



XL = max 

I Ml;oo gg£; 



Y\Xii<^ 



i=l 



The problem is simply ||a + Q^^IL. where Q is a projection matrix from R^^W to [R^^I'^1 onto 

some subspace. Since each copy R is same, there is no reason that there is coupling inQ between 
different copies of R^ . In the next lemma, we formalize this by the fact that any good projection 
matrixP onto the subspace {B-^Ux = 0} C R^ extends to a good projection Q onto the subspace 
{B-^UaTj = 0} C R^^^^K Therefore, we can simply extends the good circulation projection P by 
formula (Qx), = Pxj. Thus, the only last piece needed is a regularized || • H^^.^^. However, it turns 
out that smoothing via conjugate does not work well in this case because the dual space of || • ||-|^.^ 
involves with || • || , which is unfavorable for this kind of smoothing procedure. It can be proved 
that there is no such good regularized || • ||i. • Therefore, we could not do 0{m^~^°^^'k) using this 
approach, however, 0{m^^''^^'k^) is possible by using a bad regularized || • |L. . 

Lemma 47. Let smaxLlt{x) = smaxj I X^j=;^ \J {^i{^)) + i^ I • It is a convex continuously differ- 
entiable function. The Lipschitz constant ofVsmaxLlt is j and 

||x||, — tln(2m) < smaxLlf(x) < \\x\\, -\- kt. 

II lll;cx) ^ ' — "-^ ' — II lll;oo 

Proof. 1) It is clear that smaxLl^ is smooth. 
2) smaxLlj is convex. 
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Since smax^ is increasing for positive values and \/x^ + 1^ is convex, for any x, y € [R^^^^J and 
< t < 1, we have 

smaxLlt(tx + (1 - t)y) = smaxW ^ y ((txj + (1 - t)yi)(e))^ + t^ 

< smaxi [y^ {t^[x,{e)f + t^ + (1 - t)^ {y,{e)f + t^^ j 

< tsmaxLli(x) + (1 — t)smaxLli(y). 

3) The Lipschitz constant of VsniaxLl^ is |. 

Note that smaxj (not its gradient) has Lipschitz constant 1 because for any x, y G R , 

|sniaxi(x) — sniaxi(y)| 

^Eeei.M-^)+exp(^) 



tin 



In 



2171 



tin 



2ra 



Eee^(exp(-^)+exp(^: 



Eeei.(exp(-^)+exp(^) 



< t 



In ( maxexp 

X — y\\ . 



( Ax-y\{e] 



Also, by the definition of derivative, for any x,y £ R^ and t G R, we have 

sniaxi(x + ty) — smaxi(x) = t(\7sTaaxt{x), y^ + o{t). 
and it implies! (|VsmaX((x), yM < ||y|| for arbitrary y and hence 

||Vsmaxt(x)|| < 1. 

For notational simplicity, let si = smaxLlt, S2 = smaxf and sz{x) = vx^ + t^- Thus, we have 



(5) 



Si{x) = S2 ^S3(Xj(e)) . 



\i=l 



Now, we want to prove 



Note that 



I -. -.11 -^ II -. -.11 

Vsi(x) — Vsify) ,<- X — yL 

I -^ ^ ' -^^^'lloo;l ~ +'' ^Ml;oo 



dsi{x) ds2 
dxi{e) de 



ds-^ 



^S3{xi{e)) -7z{xi{e)) 
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Hence, we have 



max 



|Vsi(x)-Vsi(y)||^.^ = Yl 

e 
< y max 



Y. «3(a:j(e)) -0 (x,(e)) - ^ \Y1 ^3(%(e)) -0 {y^{e)) 



de 



Yl «3(a:j(e)) -0 (x,(e)) - ^ X] ""sixjie)) -0 (yi(e)) 



9e 



+E 



max 



"'' ' E»»(-.(')) 1 $ Me)) - '-i f E»3feM) 1 S (i'.f^)) 



9e 



E 



9S2 

de 



^S3(xj(e)) 



max 



da; 9e 



dsa 



+ Z]™P70(2/i(e)) 



^S3(2;j(e)) - 70 I X]^3(yi(^)) 



9e 



Since S3 has ^-Lipschitz gradient, we have 



By (5), we have 



Hence, we have 



ds2 
de 



< -\x — y\. 



E 



x(e)) 



< 1. 



E 



max 



ds^i , , .. ds3 
^Jx.ie)) ^Jy.ie)) 


>^S3(x,(e)) 

V i / 


{e)-m{e)\Y 

e 


ds3 (^ 
de [^ 


\ 

s?,{xi{e)) 
J 





I ^Ml;c 



Since S3 is 1-Lipschitz, we have 



dS3 

dx 



< 1. 



Since S2 has -r-Lipschitz gradient in • , we have 



E 



de ^^^ de ^y> 



< -\\x — v\ 
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Hence, we have 



dsa 



E™P777(^^(^)) 



^ssixjie)) - tJ I J^S3(yj(e)) 



de 



e 



X]^3(^i(^)) I 70 I Z1^3(yj(e)) 



de \ ^-^ / (9e 

\ J / 

i 



F-y 



l:oo 



Therefore, we have 



|Vsi(x) — Vsi(y)|| ,<-||i; — ylL 



4) Using the fact that 



(e)|| <Y,V(^^i^)f+t^^ \He)\\^ + kt 



i=l 



and smax is 1-Lipschitz, we have 



I l;oo 



tln(2m) < suiaxLlfix) < ||x||, + kt. 



The last thing needed is to check is that the # operator is easy to compute. 
Lemma 48. In || • ||-,__, the j^ operator is given by an explicit formula 



ll;oo' 



D 



X* I lel 



||2?||i;oosign(xi(e)) if i is the smallest index such that minj |xj(e)| = Xi{e) 
otherwises 



Proof. It can be proved by direct computation. 



D 



Now, all the conditions in the Assumption 45 are satisfiedTherefore, Theorem 46 and Theorem 19 
gives us the following theorem: 

Theorem 49. Given an undirected capacitated graph G = {V^ E, fl) with capacity ratio U . Assume 
U = poly(|l/|). There is an algorithm finds an (1 — e) approximate Maximum Goncurrent Flow in 
time 



O |^^|^|2'^(\/i°gM^°ei^gM)A 
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Proof. Let A be the oblivious routing algorithm given by Theorem 19. And we have p(A) < 
2 v^ A Let us define the scaled circulation projection matrix P = I — UAB^U"^. 

Lemma 12 shows that ||P|| < 1 + 2«(Vi°gl^|iogi°sl^l) . 

M Moo — 

Let the multi-commodity circulation projection matrix Q : R^^W — >• R^^I'^l defined by (Qx)^ = 
Px*j. Note that the definition of ||Q||, is similar to /o(Q)- By similar proof as Lemma 10, we 

have llQlL = ||P|| . Hence, we have IIqIL < 1 + 2^(^^°^'^''°^'°^'^0. Also, since P is a 

ll^'Ml;oo M Moo ' M^'Ml;oo — ' 

projection matrix on the subspace {x G R : B Ux = 0}, Q is a projection matrix on the subspace 
{xG R^^W :B^Uf; = 0}. 

By Lemma 47, the function smaxLlt(x) is a convex continuously differentiable function such that 
the Lipschitz constant of VsmaxLlj is j and 

llxIL — tln(2m) < sraaxLlfix) < ||x||, + kt. 

II lll;oo ^ ' — "-"^ ' — II lll;oo 

Given an arbitrary set of demands Xi ^ 1^ ) '^s find a vector y such that 

B^Vy = -Xi. 
Then, we use the NonlinearProjection to solve 

min X — y L 

using a family of functions smaxLlt(x) +tln(2n) and the projection matrix Q. Since each iteration 
involves calculation of gradients and # operator, it takes 0{mk) each iteration. And it takes 

O ( ||Q|L. K/e"^] iterations in total where K = k + ln(2m). In total, it NonlinearProjection 

outputs a (1 + e) approximate minimizer x in time 

O f — 1^|2'^(^^°^'^'^°*^'°^'^'' 

And it gives a {1 — e) approximate maximum concurrent flow /j by the formula 

fi = lJ{xi - yi)/\\x- y\\^.^. 

D 
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A Some Facts about Norm and Functions with Lipschitz Gradient 

In this section, we present some basic fact used in this paper about norm and dual norm. Also, we 
presented some lemmas about convex functions with Lipschitz gradient. See [ , ] for comprehen- 
sive discussion. 

A.l Norms 

Fact 50. 

X = -^ f* = 0. 

Proof. If X = then Vs 7^ we have (^^ -s*) — ^ 1 1 s| | < but (x, x) — || | x| | = 0. So we have x* = 0. 



If X 7^ then let s = \ \/i X with this choice we have (x, s) — 9 p^ = g 1 1 ' 1 1 2 > 0. However, for 



|2 _ 1 \x,xj 

2inl ~ 2 

1 1 2; I 



s = we have that (x, s) — 5 p = therefore we have x* 7^ 0. D 



2 
Fact 51. 



VxeR" : (x,x*)= X 



Tr \ — rY'Tr 



Proof. If X = then x* = by Claim 50 and we have the result. Otherwise, again by claim 50 we 
know that x* 7^ and therefore by the definition of x* we have 



1 II ^\\2 I 4t\ C^ii 4ii|2 



ceR ' ■ 2" ■■ cSR 



1 = argmax/x, c • x*\ lie • x*|| = argmaxc • {'x,x'^\ ||x 



( X X / 

Setting the derivative of with respect to c to we get that 1 = c = | ' ^ • D 

\\x#\\ 
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Fact 52. 

VxGR" : ||x||* = ||f#||. 

Proof. Note that if x = then the claim follows from Claim (50) otherwise we have 



\x\\ 



max (x, y) = max (x, y) < max ■ 

II lUi II II 1 veK" llyl 



From this it is clear that \\x\\ > h?^ • To see the other direction consider a y that maximizes the 
above and let z= ). 'A y 



\\y\\ 
and therefore 

II ^11*2 1 II ^11*2 ^ 1 II ^Ji||2 

II II 2" " - 2" " 

D 

Fact 53. [Cauchy Shcwarz] 

Vfj^eR" : (y,f) < ||y||*||x||. 

Proof. By the definition of dual norm, for all ||x|| = 1, we have (y, x) < \\y\\ . Hence, it follows by 
linearity of both side. D 

A. 2 Functions with Lipschitz Gradient 

Lemma 54. Let f be a continuously differentiable convex function. Then, the following are equiv- 
alence: 

Vx,yGR" : || V /(^) " v/(y)ir < ^ • ||^- y|| 

and 

Vx,yGR" : /(f) < /(^) + ( y /(y),f - y> + ^||x - y||'. 

For any such f and any x G R" , we have 



fix- ^Vfixf) < fix) - ^11 V /(x)ir'. 



48 



Proof. From the first condition, we have 

fiy) = fix) + J j^f{x + t{y-x))dt 

= f{x)+ I {Vfix + t{y-x)),y-x)dt 
Jo 

= fix) + {Vfix),y-x)+ [ {Vf{x + tiy-x))-Vfix),y-x)dt 

Jo 

< f{x) + {Vf{x),y-x)+ [ \\Vf{x + t{y-x))-Vf{x)\\*\\y-x\\dt 

Jo 

< f{x) + {Vf{x),y-x)+ [ Lt\\y-x\fdt 

Jo 

= f{x) + {'^f{x),y-x) + -\\y- x\\ . 

Given the second condition. For any x G R". let ip^iy) = f{y) — (V/(x), y). From the convexity of 
/, for any y G R" 

/(y)-/(x)>(V/(f),y-x>. 

Hence, x is a minimizer of (/),;g. Hence, we have 

< (f'xiy) - {^<t)x{y), jy4'x{y)*) + -^\\jy<t)x{y)*\\ (First part of this lemma) 

= <l^x{v)-^pUv)*f 



= Uy)-Ji{pUy)\\'f- 

Hence, 

f{v) > f{S) + (V/(f),y - x> + ^ (||V/(y) - V/(f)|n' . 

Adding up this inequality with x and y interchanged, we have 

^(||V/(y)-V/(x)|n' < (V/(y)-V/(x),y-x) 

< II v/(y)- v/(^)iri|y-^||- 

The last inequality follows from similar proof in above for cj)g. D 

The next lemma relate the Hessian of function with the Lipschitz parameter L and this lemma 
gives us a easy way to compute L. 

Lemma 55. Let f he a twice differentiahle function such that for any x,y £ R" 

0<f{V^f{x))y<L\\y\\\ 
Then, f is convex and the gradient of f is Lipschitz continuous with Lipschitz parameter L. 
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Proof. Similarly to Lemma 54, we have 

f{y) = f{x) + {Vf{x),y-x)+ [ {Vf{x + t{y-x))-Vf{x),y-x)dt 

Jo 

= f{x) + {Vf{x),y-x)+ f t{y-xfV^f{x + 9t{y-x)){y-x)dt 

Jo 

where the < Of < t comes from mean value theorem. By the assumption, we have 

/(f) + (V/(x),y-x) < f{y) 

< /(x) + (V/(x),y-f>+ [ tL\\y-x\\^dt 

Jo 

L\\ ||2 

< /(x) + (V/(x),y- f) + -||y-x|| . 

And the conclusion follows from Lemma 54. D 
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